Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network
Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research.
Two types of approaches have been proposed in the literature to achieve this: code generation, which involves generating a new code snippet, and code search, which involves reusing existing code.
However, despite existing efforts, the effectiveness of the state-of-the-art techniques remains limited.
To seek for further advancement, our insight is that code generation and code search can help overcome the limitation of each other:
the code generator can benefit from feedback on the quality of its generated code, which can be provided by the code searcher, while the code searcher can benefit from the additional training data augmented by the code generator to better understand code semantics.
Drawing on this insight, we propose a novel approach that combines code generation and code search techniques using a generative adversarial network (GAN), enabling mutual improvement through the adversarial training.
Specifically, we treat code generation and code search as the generator and discriminator in the GAN framework, respectively, and incorporate several customized designs for our tasks.
We evaluate our approach in eight different settings, and consistently observe significant performance improvements for both code generation and code search.
For instance, when using NatGen, a state-of-the-art code generator, as the generator and GraphCodeBERT, a state-of-the-art code searcher, as the discriminator, we achieve a 32% increase in CodeBLEU score for code generation, and a 12% increase in mean reciprocal rank for code search on a large-scale Python dataset, compared to their original performances.