SPLASH 2023
Sun 22 - Fri 27 October 2023 Cascais, Portugal
Thu 26 Oct 2023 11:18 - 11:36 at Room II - language semantics Chair(s): Sebastian Erdweg

Regular expressions (regexes) are ubiquitous in modern software. There is a variety of implementation techniques for regex matching, which can be roughly categorized as (1) relying on backtracking search, or (2) being based on finite-state automata. The implementations that use backtracking are often chosen due to their ability to support advanced pattern-matching constructs. Unfortunately, they are known to suffer from severe performance problems. For some regular expressions, the running time for matching can be exponential in the size of the input text. In order to provide stronger guarantees of matching efficiency, automata-based regex matching is the preferred choice. However, even these regex engines may exhibit severe performance degradation for some patterns. The main reason for this is that regexes used in practice are not exclusively built from the classical regular constructs, i.e., concatenation, nondeterministic choice and Kleene's star. They involve additional constructs that provide succinctness and convenience of expression. The most common such construct is bounded repetition (also called counting), which describes the repetition of the pattern a fixed number of times.

In this paper, we propose a new algorithm for the efficient matching of regular expressions that involve bounded repetition. Our algorithms are based on a new model of automata, which we call nondeterministic bit vector automata (NBVA). This model is chosen to be expressively equivalent to nondeterministic counter automata with bounded counters, a very natural model for expressing patterns with bounded repetition. We show that there is a class of regular expressions with bounded repetition that can be matched in time that is independent from the repetition bounds. Our algorithms are general enough to cover the vast majority of challenging bounded repetitions that arise in practice. We provide an implementation of our approach in a regex engine, which we call BVA-Scan. We compare BVA-Scan against state-of-the-art regex engines on several real datasets.

Thu 26 Oct

Displayed time zone: Lisbon change

11:00 - 12:30
language semanticsOOPSLA at Room II
Chair(s): Sebastian Erdweg JGU Mainz
11:00
18m
Talk
The Essence of Verilog: A Tractable and Tested Operational Semantics for Verilog
OOPSLA
Qinlin Chen Nanjing University, Nairen Zhang Nanjing University, Jinpeng Wang Nanjing University, Tian Tan Nanjing University, Chang Xu Nanjing University, Xiaoxing Ma Nanjing University, Yue Li Nanjing University
DOI
11:18
18m
Talk
Regular Expression Matching using Bit Vector Automata
OOPSLA
Alexis Le Glaunec Rice University, Lingkun Kong Rice University, Konstantinos Mamouras Rice University
DOI
11:36
18m
Talk
Bidirectional Object-Oriented Programming: Towards Programmatic and Direct Manipulation of Objects
OOPSLA
Xing Zhang Peking University, Guanchen Guo Peking University, Xiao He University of Science and Technology Beijing, Zhenjiang Hu Peking University
DOI
11:54
18m
Talk
Bring Your Own Data Structures to DatalogDistinguished Paper
OOPSLA
Arash Sahebolamri Syracuse University, Langston Barrett Galois, Scott Moore Galois, Kristopher Micinski Syracuse University
DOI
12:12
18m
Talk
Rhombus: A New Spin on Macros without All the Parentheses
OOPSLA
Matthew Flatt University of Utah, Taylor Allred University of Utah, Nia Angle independent, Stephen De Gabrielle independent, Robert Bruce Findler Northwestern University, Jack Firth independent, Kiran Gopinathan National University of Singapore, Ben Greenman University of Utah, Siddhartha Kasivajhula independent, Alex Knauth independent, Jay McCarthy Reach, Sam Phillips independent, Sorawee Porncharoenwase University of Washington, Jens Axel Søgaard independent, Sam Tobin-Hochstadt Indiana University
DOI Pre-print