Research Papers: Design Theory and Methodology

When Crowdsourcing Fails: A Study of Expertise on Crowdsourced Design Evaluation

[+] Author and Article Information
Alex Burnap

Design Science,
University of Michigan,
Ann Arbor, MI 48109
e-mail: aburnap@umich.edu

Yi Ren

Research Fellow
Department of Mechanical Engineering,
University of Michigan,
Ann Arbor, MI 48109
e-mail: yiren@umich.edu

Richard Gerth

Research Scientist
National Automotive Center,
Warren, MI 48397
e-mail: richard.j.gerth.civ@mail.mil

Giannis Papazoglou

Department of Mechanical Engineering,
Cyprus University of Technology,
Limassol 3036, Cyprus
e-mail: papazogl@umich.edu

Richard Gonzalez

Department of Psychology,
University of Michigan,
Ann Arbor, MI 48109
e-mail: gonzo@umich.edu

Panos Y. Papalambros

Fellow ASME
Department of Mechanical Engineering,
University of Michigan,
Ann Arbor, MI 48109
e-mail: pyp@umich.edu

1Corresponding author.

Contributed by the Design Theory and Methodology Committee of ASME for publication in the JOURNAL OF MECHANICAL DESIGN. Manuscript received April 29, 2014; final manuscript received November 6, 2014; published online January 9, 2015. Assoc. Editor: Jonathan Cagan.

This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is unlimited.

J. Mech. Des 137(3), 031101 (Mar 01, 2015) (9 pages) Paper No: MD-14-1264; doi: 10.1115/1.4029065 History: Received April 29, 2014; Revised November 06, 2014; Online January 09, 2015

Crowdsourced evaluation is a promising method of evaluating engineering design attributes that require human input. The challenge is to correctly estimate scores using a massive and diverse crowd, particularly when only a small subset of evaluators has the expertise to give correct evaluations. Since averaging evaluations across all evaluators will result in an inaccurate crowd evaluation, this paper benchmarks a crowd consensus model that aims to identify experts such that their evaluations may be given more weight. Simulation results indicate this crowd consensus model outperforms averaging when it correctly identifies experts in the crowd, under the assumption that only experts have consistent evaluations. However, empirical results from a real human crowd indicate this assumption may not hold even on a simple engineering design evaluation task, as clusters of consistently wrong evaluators are shown to exist along with the cluster of experts. This suggests that both averaging evaluations and a crowd consensus model that relies only on evaluations may not be adequate for engineering design tasks, accordingly calling for further research into methods of finding experts within the crowd.

Copyright © 2015 by ASME
Your Session has timed out. Please sign back in to continue.



Grahic Jump Location
Fig. 1

Graphical representation of the Bayesian network crowd consensus model. This model describes a crowd of evaluators making evaluations rpd that have error from the true score Φd. Each evaluator has an expertise ap and each design has an difficulty dd. The gray shading on the evaluation rpd denotes that it is the only observed data for this model.

Grahic Jump Location
Fig. 2

(a) Low evaluation expertise (dashed) relative to the design evaluation difficulty results in an almost uniform distribution of an evaluator's evaluation response, while high evaluation expertise (dotted) results in evaluators making evaluations closer to the true score. (b) An evaluator's evaluation error variance σpd2 as a function of that evaluator's expertise ap given some fixed design difficulty dd and crowd-level parameters θ and γ.

Grahic Jump Location
Fig. 3

Crowd expertise distributions for Cases I and II that test how the expertise of evaluators within the crowd affect evaluation error for homogeneous and heterogeneous crowds, respectively. Three possible sample crowds are shown for both cases.

Grahic Jump Location
Fig. 4

Case I: Design evaluation error from the averaging and Bayesian network methods as a function of average evaluator expertise for homogeneous crowds. This plot shows that, when dealing with homogeneous crowds, aggregating the set of evaluations into the crowd's consensus score only sees marginal benefits from using the Bayesian network around 0.4–0.7 range of evaluator expertise.

Grahic Jump Location
Fig. 5

Case II: Design evaluation error over a set of designs for a mixed crowd with low average evaluation expertise. With increasing crowd variance of expertise there is an increasingly higher proportion of high-expertise evaluators present within the crowd. This leads to a point where the Bayesian network is able to identify the cluster of high-expertise evaluators, upon which evaluation error drops to zero.

Grahic Jump Location
Fig. 6

(a) Boundary conditions for bracket strength evaluation and (b) the set of all eight bracket designs

Grahic Jump Location
Fig. 7

Clustering of evaluators based on how similar their evaluations are across all eight designs. Each black or colored point represents an individual evaluator, where colored points represent evaluators who were similar to at least 3 other evaluators, and black points represent evaluators who tended to evaluate more uniquely

Grahic Jump Location
Fig. 8

Design evaluation error with respect to additional experts

Grahic Jump Location
Fig. 9

Design evaluation error with respect to the proportion of the expert group




Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In