SPLASH 2023
Sun 22 - Fri 27 October 2023 Cascais, Portugal
Wed 25 Oct 2023 11:54 - 12:12 at Room I - AI4SE Chair(s): Guido Salvaneschi

This paper introduces a novel method, called WheaCha, for explaining the predictions of code models. Similar to attribution methods, WheaCha seeks to identify input features that are responsible for a particular prediction that models make. On the other hand, it differs from attribution methods in crucial ways. Specifically, WheaCha separates an input program into "wheat" (i.e., defining features that are the reason for which models predict the label that they predict) and the rest "chaff" for any given prediction. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show that (1) HuoYan is efficient — taking on average under twenty seconds to compute wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to make predictions is predominantly comprised of simple syntactic or even lexical properties (i.e., identifier names); (3) neither the latest explainability methods for code models (i.e., SIVAND and CounterFactual Explanations) nor the most noteworthy attribution methods (i.e., Integrated Gradients and SHAP) can precisely capture wheat. Finally, we set out to demonstrate the usefulness of WheaCha, in particular, we assess if WheaCha’s explanations can help end users to identify defective code models (e.g., trained on mislabeled data or learned spurious correlations from biased data). We find that, with WheaCha, users achieve far higher accuracy in identifying faulty models than SIVAND, CounterFactual Explanations, Integrated Gradients and SHAP.

Wed 25 Oct

Displayed time zone: Lisbon change

11:00 - 12:30
AI4SEOOPSLA at Room I
Chair(s): Guido Salvaneschi University of St. Gallen
11:00
18m
Talk
Grounded Copilot: How Programmers Interact with Code-Generating ModelsDistinguished Paper
OOPSLA
Shraddha Barke University of California at San Diego, Michael B. James University of California at San Diego, Nadia Polikarpova University of California at San Diego
DOI
11:18
18m
Talk
Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
OOPSLA
Alex Renda Massachusetts Institute of Technology, Yi Ding Purdue University, Michael Carbin Massachusetts Institute of Technology
DOI Pre-print
11:36
18m
Talk
Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving
OOPSLA
Fangke Ye Georgia Institute of Technology, Jisheng Zhao Georgia Institute of Technology, Jun Shirako Georgia Institute of Technology, Vivek Sarkar Georgia Institute of Technology
DOI
11:54
18m
Talk
An Explanation Method for Models of Code
OOPSLA
Yu Wang Nanjing University, Ke Wang , Linzhang Wang Nanjing University
DOI
12:12
18m
Talk
Optimization-Aware Compiler-Level Event Profiling
OOPSLA
Matteo Basso Università della Svizzera italiana (USI), Switzerland, Aleksandar Prokopec Oracle Labs, Andrea Rosà USI Lugano, Walter Binder USI Lugano
Link to publication DOI