Abstract The integration of behavioral phenomena into quantitative models of human cognition is a fundamental staple of the cognitive and behavioral sciences. Yet, researchers are beginning to accumulate increasing amounts of data without having the time or monetary resources to integrate these data into scientific theories and/or to test the resulting theories in follow-up experiments. We seek to overcome these limitations by incorporating machine learning techniques into a closed-loop system for the generation, estimation, and empirical validation of scientific models. Critically, this system allows for its use as a metascientific test bed—one that enables examinations of different combinations of theory discovery methods and approaches to experimentation. As a case study, we demonstrate how our system can systematically evaluate the abilities of such combinations to effectively recover established models of human behavior and brain function.