Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.
Wed 25 OctDisplayed time zone: Lisbon change
14:00 - 15:30 | |||
14:00 18mTalk | Mobius: Synthesizing Relational Queries with Recursive and Invented Predicates OOPSLA Aalok Thakkar University of Pennsylvania, Nathaniel Sands University of Southern California, Georgios Petrou University of Southern California, Rajeev Alur University of Pennsylvania, Mayur Naik University of Pennsylvania, Mukund Raghothaman University of Southern California DOI | ||
14:18 18mTalk | Data Extraction via Semantic Regular Expression Synthesis OOPSLA Jocelyn (Qiaochu) Chen University of Texas at Austin, Arko Banerjee University of Texas at Austin, Çağatay Demiralp Massachusetts Institute of Technology, Greg Durrett University of Texas at Austin, Işıl Dillig University of Texas at Austin DOI | ||
14:36 18mTalk | Synthesizing Efficient Memoization Algorithms OOPSLA DOI | ||
14:54 18mTalk | Algebro-geometric Algorithms for Template-Based Synthesis of Polynomial Programs OOPSLA Amir Kafshdar Goharshady Hong Kong University of Science and Technology, S. Hitarth Hong Kong University of Science and Technology, Fatemeh Mohammadi KU Leuven, Harshit Jitendra Motwani Ghent University DOI | ||
15:12 18mTalk | Modular Component-Based Quantum Circuit Synthesis OOPSLA DOI |