Research Papers: Design Automation

A Systematic Methodology Based on Word Embedding for Identifying the Relation Between Online Customer Reviews and Sales Rank

[+] Author and Article Information
Dedy Suryadi

Enterprise Systems Optimization Laboratory,
Department of Industrial and Enterprise
Systems Engineering,
University of Illinois at Urbana-Champaign,
Urbana, IL 61801;
Industrial Engineering Department,
Parahyangan Catholic University,
Bandung 40141, Indonesia,
e-mails: suryadi2@illinois.edu;

Harrison Kim

Enterprise Systems Optimization Laboratory,
Department of Industrial and Enterprise
Systems Engineering,
University of Illinois at Urbana-Champaign,
Urbana, IL 61801
e-mail: hmkim@illinois.edu

1Corresponding author.

Contributed by the Design Automation Committee of ASME for publication in the JOURNAL OF MECHANICAL DESIGN. Manuscript received November 20, 2017; final manuscript received July 9, 2018; published online September 18, 2018. Assoc. Editor: Scott Ferguson.

J. Mech. Des 140(12), 121403 (Sep 18, 2018) (12 pages) Paper No: MD-17-1779; doi: 10.1115/1.4040913 History: Received November 20, 2017; Revised July 09, 2018

In the buying decision process, online reviews become an important source of information. They become the basis of evaluating alternatives before making purchase decision. This paper proposes a methodology to reveal one of the hidden alternative evaluation processes by identifying the relation between the observable online customer reviews and sales rank. This methodology applies a combined approach of word embedding (word2vec) and X-means clustering, which produces product-feature words. It is followed by identifying sentiment words and their intensity, determining connection of words from dependency tree, and finally relating variables from the reviews to the sales rank of a product by a regression model. The methodology is applied to two data sets of wearable technology and laptop products. As implied by the high predicted R-squared values, the models are generalizable into new data sets. Among the interesting findings are the statements of problems or issues of a product are related to better sales rank, and many product features that are mentioned in the review title are significantly related to sales rank. For product designers, the significant variables in the regression models suggest the possible product features to be improved.

Copyright © 2018 by ASME
Your Session has timed out. Please sign back in to continue.


Kotler, P. , and Keller, K. L. , 2006, Marketing Management, 12th ed., Pearson Education, Upper Saddle River, NJ, pp. 184–199.
Decker, R. , and Trusov, M. , 2010, “ Estimating Aggregate Consumer Preferences From Online Product Reviews,” Int. J. Res. Mark., 27(4), pp. 293–307. [CrossRef]
Chevalier, J. A. , and Mayzlin, D. , 2006, “ The Effect of Word of Mouth on Sales: Online Book Reviews,” J. Mark. Res., 43(3), pp. 345–354. [CrossRef]
Sun, M. , 2012, “ How Does the Variance of Product Ratings Matter?,” Manage. Sci., 58(4), pp. 696–707. [CrossRef]
Jindal, N. , and Liu, B. , 2007, “ Review Spam Detection,” 16th International Conference on World Wide Web (WWW), Banff, AB, Canada, May 8–12, pp. 1189–1190.
Guo, H. , Zhu, H. , Guo, Z. , Zhang, X. , and Su, Z. , 2009, “ Product Feature Categorization With Multilevel Latent Semantic Association,” 18th ACM Conference on Information and Knowledge Management (CIKM '09), Hong Kong, China, Nov. 2–6, pp. 1087–1096.
Suryadi, D. , and Kim, H. M. , 2017, “ Identifying Sentiment-Dependent Product Features From Online Reviews,” Design Computing and Cognition '16, J. S. Gero , ed., Springer International Publishing, Basel, Switzerland, pp. 685–701.
Suryadi, D. , and Kim, H. , 2016, “ Identifying the Relations Between Product Features and Sales Rank From Online Reviews,” ASME Paper No. DETC2016-60481.
Chong, A. Y. L. , Ch'ng, E. , Liu, M. J. , and Li, B. , 2017, “ Predicting Consumer Product Demands Via Big Data: The Roles of Online Promotional Marketing and Online Reviews,” Int. J. Prod. Res., 55(17), pp. 5142–5156. [CrossRef]
Quan, X. , Wenyin, L. , and Dou, W. , 2011, “ Longitudinal Sales Responses With Online Reviews,” 11th IEEE International Conference on Data Mining Workshops (ICDMW), Vancouver, BC, Dec. 11, pp. 103–108.
Hu, M. , and Liu, B. , 2004, “ Mining Opinion Features in Customer Reviews,” 19th National Conference on Artifical Intelligence, San Jose, CA, July 25–29, pp. 755–760.
Hu, M. , and Liu, B. , 2004, “ Mining and Summarizing Customer Reviews,” Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD '04), Seattle, WA, Aug. 22–25, pp. 168–177.
Wei, C.-P. , Chen, Y.-M. , Yang, C.-S. , and Yang, C. C. , 2010, “ Understanding What Concerns Consumers: A Semantic Approach to Product Feature Extraction From Consumer Reviews,” Inf. Syst. e-Bus. Manage., 8(2), pp. 149–167. [CrossRef]
Abulaish, M. , Jahiruddin , Doja, M. N. , and Ahmad, T. , 2009, Feature and Opinion Mining for Customer Review Summarization, Springer, Berlin, Heidelberg, pp. 219–224.
Ma, B. , Zhang, D. , Yan, Z. , and Kim, T. , 2013, “ An LDA and Synonym Lexicon Based Approach to Product Feature Extraction From Online Consumer Product Reviews,” J. Electron. Commerce, 14(4), pp. 304–314. https://search.proquest.com/docview/1468675628?accountid=14553
Mei, Q. , Ling, X. , Wondra, M. , Su, H. , and Zhai, C. , 2007, “ Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs,” 16th International Conference on World Wide Web (WWW '07), Banff, AB, Canada, May 8–12, pp. 171–180.
Archak, N. , Ghose, A. , and Ipeirotis, P. G. , 2007, “ Show Me the Money!: Deriving the Pricing Power of Product Features by Mining Consumer Reviews,” 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD '07), San Jose, CA, Aug. 12–15, pp. 56–65.
Kobayashi, N. , Inui, K. , and Matsumoto, Y. , 2007, “ Extracting Aspect-Evaluation and Aspect-of Relations in Opinion Mining,” Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 1065–1074. http://aclweb.org/anthology/D07-1114
Netzer, O. , Feldman, R. , Goldenberg, J. , and Fresko, M. , 2012, “ Mine Your Own Business: Market-Structure Surveillance Through Text Mining,” Mark. Sci., 31(3), pp. 521–543. [CrossRef]
Archak, N. , Ghose, A. , and Ipeirotis, P. G. , 2011, “ Deriving the Pricing Power of Product Features by Mining Consumer Reviews,” Manage. Sci., 57(8), pp. 1485–1509. [CrossRef]
Zhou, F. , Jiao, R. J. , and Linsey, J. S. , 2015, “ Latent Customer Needs Elicitation by Use Case Analogical Reasoning From Sentiment Analysis of Online Product Reviews,” ASME J. Mech. Des., 137(7), p. 0714011. [CrossRef]
Somprasertsri, G. , and Lalitrojwong, P. , 2008, “ Automatic Product Feature Extraction From Online Product Reviews Using Maximum Entropy With Lexical and Syntactic Features,” IEEE International Conference on Information Reuse and Integration, Las Vegas, NV, July 13–15, pp. 250–255.
Somprasertsri, G. , 2010, “ Mining Feature-Opinion in Online Customer Reviews for Opinion Summarization,” J. Universal Comput. Sci., 16(6), pp. 938–955.
Jin, W. , Ho, H. H. , and Srihari, R. K. , 2009, “ Opinionminer: A Novel Machine Learning System for Web Opinion Mining and Extraction,” 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, France, June 28–July 1, pp. 1195–1204.
Zhai, Z. , Liu, B. , Xu, H. , and Jia, P. , 2011, “ Clustering Product Features for Opinion Mining,” Fourth ACM International Conference on Web Search and Data Mining (WSDM '11), Hong Kong, China, Feb. 9–12, pp. 347–354.
Jurafsky, D. , and Martin, J. H. , 2009, Speech and Language Processing, 2nd ed., Pearson Education, Upper Saddle River, NJ.
Cambria, E. , Poria, S. , Bajpai, R. , and Schuller, B. W. , 2016, “ Senticnet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives,” 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, Dec. 11–16, pp. 2666–2677.
Kim, S.-M. , and Hovy, E. , 2006, “ Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text,” Workshop on Sentiment and Subjectivity in Text (SST '06), Sydney, Australia, July 22,, Association for Computational Linguistics, pp. 1–8.
Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. , and Dean, J. , 2013, “ Distributed Representations of Words and Phrases and Their Compositionality,” e-print .
Zhang, D. , Xu, H. , Su, Z. , and Xu, Y. , 2015, “ Chinese Comments Sentiment Classification Based on word2vec and SVMperf,” Expert Syst. Appl., 42(4), pp. 1857–1863. [CrossRef]
Levy, O. , and Goldberg, Y. , 2014, “ Dependency-Based Word Embeddings,” 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, MD, June 23–25, pp. 302–308. http://www.aclweb.org/anthology/P14-2050
Rong, X. , 2014, “ word2vec Parameter Learning Explained,” e-print
Manning, C. D. , Surdeanu, M. , Bauer, J. , Finkel, J. , Bethard, S. J. , and McClosky, D. , 2014, “ The Stanford CoreNLP natural language processing toolkit,” Association for Computational Linguistics (ACL) System Demonstrations, Baltimore, MD, pp. 55–60.
Řehůřek, R. , and Sojka, P. , 2010, “ Software Framework for Topic Modelling With Large Corpora,” LREC 2010 Workshop on New Challenges for NLP Frameworks, La Valleta, Malta, May 22, pp. 45–50.
Pelleg, D. , and Moore, A. W. , 2000, “ X-Means: Extending k-Means With Efficient Estimation of the Number of Clusters,” 17th International Conference on Machine Learning, ICML '00, San Francisco, CA, June 29–July 2, pp. 727–734. https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf
Quan, C. , and Ren, F. , 2014, “ Unsupervised Product Feature Extraction for Feature-Oriented Opinion Determination,” Inf. Sci., 272, pp. 16–28. [CrossRef]
Cambria, E. , Hussain, A. , Havasi, C. , and Eckl, C. , 2009, “ Affectivespace: Blending Common Sense and Affective Knowledge to Perform Emotive Reasoning,” WOMSA09, Seville, Spain, pp. 32–41. http://dig.csail.mit.edu/2010/DIG_Seminar/erik_talk/AffectiveSpace.pdf
Forman, C. , Ghose, A. , and Wiesenfeld, B. , 2008, “ Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets,” Inf. Syst. Res., 19(3), pp. 291–313. [CrossRef]
Miller, A. , 2002, Subset Selection in Regression, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL.


Grahic Jump Location
Fig. 1

Five-stage model of buying decision process

Grahic Jump Location
Fig. 2

Relations between adjectives and nouns in: (a) a sentence without negation and (b) a sentence with negation

Grahic Jump Location
Fig. 4

The flowchart of the proposed methodology

Grahic Jump Location
Fig. 3

Skip-gram model (Source: [32])

Grahic Jump Location
Fig. 6

Connecting adjective (JJ) to nouns in a sentence: (a) direct child, (b) direct parent, (c-1) no relations found, so the search continues to (c-2) and (c-3) by moving the JJ toward the root; (c-2) indirect parent; (c-3) indirect child

Grahic Jump Location
Fig. 5

Word assignment into clusters: (a) before adjustment and (b) after adjustment

Grahic Jump Location
Fig. 7

Conversion into regression variables



Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In