0
Research Papers: Design Automation

A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data

[+] Author and Article Information
Sunghoon Lim

Industrial and Manufacturing Engineering,
The Pennsylvania State University,
University Park, PA 16802
e-mail: slim@psu.edu

Conrad S. Tucker

Mem. ASME
Engineering Design and Industrial
and Manufacturing Engineering,
The Pennsylvania State University,
University Park, PA 16802
e-mail: ctucker4@psu.edu

1Corresponding author.

Contributed by the Design Automation Committee of ASME for publication in the JOURNAL OF MECHANICAL DESIGN. Manuscript received June 29, 2015; final manuscript received March 24, 2016; published online April 20, 2016. Assoc. Editor: Gary Wang.

J. Mech. Des 138(6), 061403 (Apr 20, 2016) (9 pages) Paper No: MD-15-1456; doi: 10.1115/1.4033238 History: Received June 29, 2015; Revised March 24, 2016

The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term “screen,” which may return relevant results such as “the screen size is just perfect,” but may also contain irrelevant noise such as “researchers should really screen for this type of error.” A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

FIGURES IN THIS ARTICLE
<>
Copyright © 2016 by ASME
Your Session has timed out. Please sign back in to continue.

References

Figures

Grahic Jump Location
Fig. 4

The process of the proposed methodology

Grahic Jump Location
Fig. 3

Term disambiguation problem and keyword recognition problem

Grahic Jump Location
Fig. 2

N, R, and data containing “siri” or “ios”

Grahic Jump Location
Fig. 1

Overview of the proposed methodology

Grahic Jump Location
Fig. 5

Average values of the F1 scores

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In