Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program.
We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
Wed 25 OctDisplayed time zone: Lisbon change
11:00 - 12:30 | |||
11:00 18mTalk | Grounded Copilot: How Programmers Interact with Code-Generating Models OOPSLA Shraddha Barke University of California at San Diego, Michael B. James University of California at San Diego, Nadia Polikarpova University of California at San Diego DOI | ||
11:18 18mTalk | Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs OOPSLA Alex Renda Massachusetts Institute of Technology, Yi Ding Purdue University, Michael Carbin Massachusetts Institute of Technology DOI Pre-print | ||
11:36 18mTalk | Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving OOPSLA Fangke Ye Georgia Institute of Technology, Jisheng Zhao Georgia Institute of Technology, Jun Shirako Georgia Institute of Technology, Vivek Sarkar Georgia Institute of Technology DOI | ||
11:54 18mTalk | An Explanation Method for Models of Code OOPSLA DOI | ||
12:12 18mTalk | Optimization-Aware Compiler-Level Event Profiling OOPSLA Matteo Basso Università della Svizzera italiana (USI), Switzerland, Aleksandar Prokopec Oracle Labs, Andrea Rosà USI Lugano, Walter Binder USI Lugano Link to publication DOI |