0
Research Papers: Design Automation

Learning an Optimization Algorithm Through Human Design Iterations

[+] Author and Article Information
Thurston Sexton

National Institute of Standards and Technology,
Gaithersburg, MD 20899
e-mail: thurston.sexton@nist.gov

Max Yi Ren

Department of Mechanical Engineering,
Arizona State University,
Tempe, AZ 85287
e-mail: yiren@asu.edu

Contributed by the Design Automation Committee of ASME for publication in the JOURNAL OF MECHANICAL DESIGN. Manuscript received December 1, 2016; final manuscript received July 19, 2017; published online August 30, 2017. Assoc. Editor: Carolyn Seepersad.

J. Mech. Des 139(10), 101404 (Aug 30, 2017) (10 pages) Paper No: MD-16-1804; doi: 10.1115/1.4037344 History: Received December 01, 2016; Revised July 19, 2017

Solving optimal design problems through crowdsourcing faces a dilemma: On the one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd's domain-specific competence, and the lack of commitment of the crowd contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian optimization (BO) algorithm and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators (MLEs) of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd.

FIGURES IN THIS ARTICLE
<>
Copyright © 2017 by ASME
Your Session has timed out. Please sign back in to continue.

References

Cooper, S. , Khatib, F. , Treuille, A. , Barbero, J. , Lee, J. , Beenen, M. , Leaver-Fay, A. , Baker, D. , and Popović, Z. , 2010, “ Solve Puzzle for Science,” Foldit, University of Washington, Seattle, WA, accessed July 26, 2017, http://fold.it
Khatib, F. , Cooper, S. , Tyka, M. D. , Xu, K. , Makedon, I. , Popović, Z. , Baker, D. , and Players, F. , 2011, “ Algorithm Discovery by Protein Folding Game Players,” Proc. Natl. Acad. Sci., 108(47), pp. 18949–18953. [CrossRef]
Lee, J. , Kladwang, W. , Lee, M. , Cantu, D. , Azizyan, M. , Kim, H. , Limpaecher, A. , Yoon, S. , Treuille, A. , and Das, R. , 2014, “ Solve Puzzle. Invent Medicine,” Eterna, Carnegie Mellon University/Stanford University, Pittsburgh, PA/Stanford, CA, accessed July 26, 2017, http://eterna.cmu.edu
Lee, J. , Kladwang, W. , Lee, M. , Cantu, D. , Azizyan, M. , Kim, H. , Limpaecher, A. , Yoon, S. , Treuille, A. , and Das, R. , 2014, “ RNA Design Rules From a Massive Open Laboratory,” Proc. Natl. Acad. Sci., 111(6), pp. 2122–2127. [CrossRef]
Kawrykow, A. , Roumanis, G. , Kam, A. , Kwak, D. , Leung, C. , Wu, C. , Zarour, E. , Sarmenta, L. , Blanchette, M. , and Waldispühl, J. , 2012, “ Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment,” PLoS One, 7(3), p. e31362. [CrossRef] [PubMed]
Sung, J. , Jin, S. H. , and Saxena, A. , 2015, “ Robobarista: Object Part Based Transfer of Manipulation Trajectories From Crowd-Sourcing in 3D Pointclouds,” preprint arXiv:1504.03071. https://arxiv.org/abs/1504.03071
Le Bras, R. , Bernstein, R. , Gomes, C. P. , Selman, B. , and Van Dover, R. B. , 2013, “ Crowdsourcing Backdoor Identification for Combinatorial Optimization,” 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, Aug. 3–9, pp. 2840–2847. https://pdfs.semanticscholar.org/fdfb/1a3e026b8d57487c1e54ea044494a1056df6.pdf
Ren, Y. , Bayrak, A. E. , and Papalambros, P. Y. , 2016, “ Ecoracer: Game-Based Optimal Electric Vehicle Design and Driver Control Using Human Players,” ASME J. Mech. Des., 138(6), p. 061407. [CrossRef]
Schrope, M. , 2013, “ Solving Tough Problems With Games,” Proc. Natl. Acad. Sci., 110(18), pp. 7104–7106. [CrossRef]
Lake, B. M. , Ullman, T. D. , Tenenbaum, J. B. , and Gershman, S. J. , 2016, “ Building Machines That Learn and Think Like People,” preprint arXiv:1604.00289. https://arxiv.org/abs/1604.00289
Ren, Y. , Bayrak, A. E. , and Papalambros, P. Y. , 2015, “ EcoRacer: Game-Based Optimal Electric Vehicle Design and Driver Control Using Human Players,” ASME Paper No. DETC2015-46836.
Jones, D. , Schonlau, M. , and Welch, W. , 1998, “ Efficient Global Optimization of Expensive Black-Box Functions,” J. Global Optim., 13(4), pp. 455–492. [CrossRef]
Brochu, E. , Cora, V. M. , and De Freitas, N. , 2010, “ A Tutorial on Bayesian Optimization of Expensive Cost Functions, With Application to Active User Modeling and Hierarchical Reinforcement Learning,” preprint arXiv:1012.2599. https://arxiv.org/abs/1012.2599
Rasmussen, C. E. , and Williams, C. K. I. , 2006, Gaussian Processes for Machine Learning, MIT Press, Cambridge, MA.
Lucas, C. G. , Griffiths, T. L. , Williams, J. J. , and Kalish, M. L. , 2015, “ A Rational Model of Function Learning,” Psychon. Bull. Rev., 22(5), pp. 1193–1215. [CrossRef] [PubMed]
Wilson, A. G. , Dann, C. , Lucas, C. , and Xing, E. P. , 2015, “ The Human Kernel,” Advances in Neural Information Processing Systems (NIPS), Montreal, QC, Canada, Dec. 7–12, pp. 2854–2862. https://papers.nips.cc/paper/5765-the-human-kernel.pdf
Rasmussen, C. E. , and Ghahramani, Z. , 2001, “ Occam's Razor,” Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, Dec. 3–8, pp. 294–300. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.5075
Borji, A. , and Itti, L. , 2013, “ Bayesian Optimization Explains Human Active Search,” Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, Dec. 5–10, pp. 55–63. http://dl.acm.org/citation.cfm?id=2999611.2999618
Levine, S. , Popovic, Z. , and Koltun, V. , 2011, “ Nonlinear Inverse Reinforcement Learning With Gaussian Processes,” Advances in Neural Information Processing Systems, pp. 19–27.
Deisenroth, M. P. , Neumann, G. , and Peters, J. , 2013, “ A Survey on Policy Search for Robotics,” Found. Trends Rob., 2(1–2), pp. 1–142.
Calandra, R. , Gopalan, N. , Seyfarth, A. , Peters, J. , and Deisenroth, M. P. , 2014, “ Bayesian Gait Optimization for Bipedal Locomotion,” International Conference on Learning and Intelligent Optimization (LION), Gainesville, FL, Feb. 16–21, pp. 274–290.
Cully, A. , Clune, J. , Tarapore, D. , and Mouret, J.-B. , 2015, “ Robots That Can Adapt Like Animals,” Nature, 521(7553), pp. 503–507. [CrossRef] [PubMed]
Pretz, J. E. , 2008, “ Intuition Versus Analysis: Strategy and Experience in Complex Everyday Problem Solving,” Mem. Cognit., 36(3), pp. 554–566. [CrossRef] [PubMed]
Linsey, J. S. , Tseng, I. , Fu, K. , Cagan, J. , Wood, K. L. , and Schunn, C. , 2010, “ A Study of Design Fixation, Its Mitigation and Perception in Engineering Design Faculty,” ASME J. Mech. Des., 132(4), p. 041003. [CrossRef]
Daly, S. R. , Yilmaz, S. , Christian, J. L. , Seifert, C. M. , and Gonzalez, R. , 2012, “ Design Heuristics in Engineering Concept Generation,” J. Eng. Educ., 101(4), pp. 601–629. [CrossRef]
Cagan, J. , Dinar, M. , Shah, J. J. , Leifer, L. , Linsey, J. , Smith, S. , and Vargas-Hernandez, N. , 2013, “ Empirical Studies of Design Thinking: Past, Present, Future,” ASME Paper No. DETC2013-13302.
Björklund, T. A. , 2013, “ Initial Mental Representations of Design Problems: Differences Between Experts and Novices,” Des. Stud., 34(2), pp. 135–160. [CrossRef]
Egan, P. , and Cagan, J. , 2016, “ Human and Computational Approaches for Design Problem-Solving,” Experimental Design Research, Springer, Cham, Switzerland, pp. 187–205. [CrossRef]
Cagan, J. , and Kotovsky, K. , 1997, “ Simulated Annealing and the Generation of the Objective Function: A Model of Learning During Problem Solving,” Comput. Intell., 13(4), pp. 534–581. [CrossRef]
Landry, L. H. , and Cagan, J. , 2011, “ Protocol-Based Multi-Agent Systems: Examining the Effect of Diversity, Dynamism, and Cooperation in Heuristic Optimization Approaches,” ASME J. Mech. Des., 133(2), p. 021001. [CrossRef]
McComb, C. , Cagan, J. , and Kotovsky, K. , 2016, “ Drawing Inspiration From Human Design Teams for Better Search and Optimization: The Heterogeneous Simulated Annealing Teams Algorithm,” ASME J. Mech. Des., 138(4), p. 044501. [CrossRef]
Thrun, S. , and Pratt, L. , 1998, “ Learning to Learn: Introduction and Overview,” Learning to Learn, Springer, Boston, MA, pp. 3–17. [CrossRef]
Wang, J. X. , Kurth-Nelson, Z. , Tirumala, D. , Soyer, H. , Leibo, J. Z. , Munos, R. , Blundell, C. , Kumaran, D. , and Botvinick, M. , 2016, “ Learning to Reinforcement Learn,” preprint arXiv:1611.05763. https://arxiv.org/abs/1611.05763
Andrychowicz, M. , Denil, M. , Gomez, S. , Hoffman, M. W. , Pfau, D. , Schaul, T. , and de Freitas, N. , 2016, “ Learning to Learn by Gradient Descent by Gradient Descent,” Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, Dec. 5–10, pp. 3981–3989. https://papers.nips.cc/paper/6461-learning-to-learn-by-gradient-descent-by-gradient-descent
Hansen, N. , Müller, S. D. , and Koumoutsakos, P. , 2003, “ Reducing the Time Complexity of the Derandomized Evolution Strategy With Covariance Matrix Adaptation (CMA-ES),” Evol. Comput., 11(1), pp. 1–18. [CrossRef] [PubMed]
Jones, D. R. , Perttunen, C. D. , and Stuckman, B. E. , 1993, “ Lipschitzian Optimization Without the Lipschitz Constant,” J. Optim. Theory Appl., 79(1), pp. 157–181. [CrossRef]
Sahinidis, N. V. , 1996, “ BARON: A General Purpose Global Optimization Software Package,” J. Global Optim., 8(2), pp. 201–205. [CrossRef]
Zhu, C. , Byrd, R. H. , Lu, P. , and Nocedal, J. , 1994, “ L-BFGS-B: Fortran Subroutines for Large Scale Bound Constrained Optimization,” Northwestern University, Evanston, IL, Report No. NAM-11. http://people.sc.fsu.edu/~inavon/5420a/lbfgsb.pdf
McGovern, A. , Sutton, R. S. , and Fagg, A. H. , 1997, “ Roles of Macro-Actions in Accelerating Reinforcement Learning,” Grace Hopper Celebration of Women in Computing (GHC), San Jose, CA, Sept. 19–21, Vol. 1317. https://pdfs.semanticscholar.org/6c42/70b9ca7cc63a02ddae8974322ec5ea082743.pdf
McGovern, A. , and Barto, A. G. , 2001, “ Automatic Discovery of Subgoals in Reinforcement Learning Using Diverse Density (Computer Science Department Faculty Publication Series),” International Conference on Machine Learning (ICML), Williamstown, MA, June 28–July 1, p. 8. https://pdfs.semanticscholar.org/7eca/3acd1a4239d8a299478885c7c0548f3900a8.pdf
Dietterich, T. G. , 1998, “ The MAXQ Method for Hierarchical Reinforcement Learning,” 15th International Conference on Machine Learning (ICML), Madison, WI, July 24–27, pp. 118–126. https://pdfs.semanticscholar.org/fdc7/c1e10d935e4b648a32938f13368906864ab3.pdf
Kulkarni, T. D. , Narasimhan, K. R. , Saeedi, A. , and Tenenbaum, J. B. , 2016, “ Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation,” preprint arXiv:1604.06057. https://arxiv.org/abs/1604.06057
Botvinick, M. , and Weinstein, A. , 2014, “ Model-Based Hierarchical Reinforcement Learning and Human Action Control,” Philos. Trans. R. Soc. B, 369(1655), p. 20130480.
Stone, J. V. , 2004, Independent Component Analysis, Wiley, Hoboken, NJ.
Hui, M. , Li, J. , Wen, X. , Yao, L. , and Long, Z. , 2011, “ An Empirical Comparison of Information-Theoretic Criteria in Estimating the Number of Independent Components of fMRI Data,” PLoS One, 6(12), p. e29274. [CrossRef] [PubMed]
Tombu, M. N. , Asplund, C. L. , Dux, P. E. , Godwin, D. , Martin, J. W. , and Marois, R. , 2011, “ A Unified Attentional Bottleneck in the Human Brain,” Proc. Natl. Acad. Sci., 108(33), pp. 13426–13431. [CrossRef]
Ng, A. Y. , and Russell, S. J. , 2000, “ Algorithms for Inverse Reinforcement Learning,” 17th International Conference on Machine Learning (ICML), Stanford, CA, June 29–July 2, pp. 663–670. http://ai.stanford.edu/~ang/papers/icml00-irl.pdf
Ziebart, B. D. , Maas, A. L. , Bagnell, J. A. , and Dey, A. K. , 2008, “ Maximum Entropy Inverse Reinforcement Learning,” 23rd National Conference on Artificial Intelligence (AAAI), Chicago, IL, July 13–17, pp. 1433–1438. https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf
Abbeel, P. , and Ng, A. Y. , 2004, “ Apprenticeship Learning Via Inverse Reinforcement Learning,” 21st International Conference on Machine Learning (ICML), Banff, AB, Canada, July 4–8, p. 1. http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf
Abbeel, P. , Coates, A. , and Ng, A. Y. , 2010, “ Autonomous Helicopter Aerobatics Through Apprenticeship Learning,” Int. J. Rob. Res., 29(13), pp. 1608–1639. [CrossRef]
Dvijotham, K. , and Todorov, E. , 2010, “ Inverse Optimal Control With Linearly-Solvable MDPs,” 27th International Conference on Machine Learning (ICML), Haifa, Israel, June 21–24, pp. 335–342. https://homes.cs.washington.edu/~todorov/papers/DvijothamICML10.pdf
Spelke, E. S. , Gutheil, G. , Van de Walle, G. , and Osherson, D. , 1995, “ The Development of Object Perception,” An Invitation to Cognitive Science, Vol. 2, 2nd ed., MIT Press, Cambridge, MA. [PubMed] [PubMed]
Baillargeon, R. , Li, J. , Ng, W. , and Yuan, S. , 2009, “ An Account of Infants' Physical Reasoning,” Learning and the Infant Mind, Oxford University Press, New York, pp. 66–116. [CrossRef]
Bates, C. J. , Yildirim, I. , Tenenbaum, J. B. , and Battaglia, P. W. , 2015, “ Humans Predict Liquid Dynamics Using Probabilistic Simulation,” 37th Annual Conference of the Cognitive Science Society (COGSCI), Pasadena, CA, July 22–25, pp. 172–177. http://www.mit.edu/~ilkery/papers/probabilistic-simulation-model.pdf
Gershman, S. J. , Horvitz, E. J. , and Tenenbaum, J. B. , 2015, “ Computational Rationality: A Converging Paradigm for Intelligence in Brains, Minds, and Machines,” Science, 349(6245), pp. 273–278. [CrossRef] [PubMed]
Fodor, J. A. , 1975, The Language of Thought, Vol. 5, Harvard University Press, Cambridge, MA.
Biederman, I. , 1987, “ Recognition-by-Components: A Theory of Human Image Understanding,” Psychol. Rev., 94(2), p. 115. [CrossRef] [PubMed]
Harlow, H. F. , 1949, “ The Formation of Learning Sets,” Psychol. Rev., 56(1), p. 51. [CrossRef] [PubMed]
Egan, P. , Cagan, J. , Schunn, C. , and LeDuc, P. , 2015, “ Synergistic Human-Agent Methods for Deriving Effective Search Strategies: The Case of Nanoscale Design,” Res. Eng. Des., 26(2), pp. 145–169. [CrossRef]
Choi, J. , and Kim, K.-E. , 2012, “ Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions,” Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, Dec. 3–8, pp. 305–313. https://papers.nips.cc/paper/4737-nonparametric-bayesian-inverse-reinforcement-learning-for-multiple-reward-functions
Ratliff, N. D. , Bagnell, J. A. , and Zinkevich, M. A. , 2006, “ Maximum Margin Planning,” 23rd International Conference on Machine Learning (NIPS), Pittsburgh, PA, June 25–29, pp. 729–736. http://martin.zinkevich.org/publications/maximummarginplanning.pdf
Syed, U. , and Schapire, R. E. , 2007, “ A Game-Theoretic Approach to Apprenticeship Learning,” Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, Dec. 3–6, pp. 1449–1456. https://papers.nips.cc/paper/3293-a-game-theoretic-approach-to-apprenticeship-learning
Ramachandran, D. , and Amir, E. , 2007, “ Bayesian Inverse Reinforcement Learning,” Urbana, 51(61801), pp. 1–4. https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-416.pdf

Figures

Grahic Jump Location
Fig. 1

(a) Summary of player participation and performance and (b) results from the game showed while most players failed to outperform the Bayesian optimization algorithm, some of them can identify good solutions early on. (Reproduced with permission from Ren et al. [8,11]. Copyright 2016 and 2015 by ASME.)

Grahic Jump Location
Fig. 2

Four iterations of BO on a 1D function. Obj: The objective function. GP: Gaussian process model. EI: Expected improvement function. Image is modified from Ref. [11].

Grahic Jump Location
Fig. 3

The minimal cost L for search trajectory lengths N=5,...,20 with respect to GαBO and GK0. αINI is fixed to 1.0 and 10.0.

Grahic Jump Location
Fig. 4

(a) Comparison on BO convergence using four algorithmic settings: (orange) Λ=10.0I, (green) Λ=0.01I, (gray) the MLE of Λ is used for each new sample, and (red) the initial setting Λ=10.0I is updated by IBO using the trajectory from Λ=0.01I. (b) The percentages of estimated Λ̂MLE along the number of iterations, averaged over the cases with Λ={0.01I,0.1I,1.0I,10.0I} and 30 trials for each case.

Grahic Jump Location
Fig. 5

Independent component analysis (ICA) bases learned from all human plays and the ecoRacer track. Vertical lines on the track correspond to the peak locations of the bases.

Grahic Jump Location
Fig. 6

The residual of current best score versus the known best score, with settings λ̂ (IBO, red), λ̂GP (MLE, blue), and the default λ=I (green). Results are shown as averages over 30 trials. One-sigma confidence intervals are calculated via 5000 bootstrap samples. Red and black dots are scores from P2 and P3, respectively.

Grahic Jump Location
Fig. 7

Qualitative comparison on control strategies from the theoretical optimal solution (top), one of the BO solutions (middle), and the best player solution (bottom)

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In