SPLASH 2023
Sun 22 - Fri 27 October 2023 Cascais, Portugal
Fri 27 Oct 2023 14:54 - 15:12 at Room II - compilation and optimization 1 Chair(s): Will Crichton

Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable 𝑂 (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires 𝑂 (𝑙𝑜𝑔 𝑛) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge.

In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient.

Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.

Fri 27 Oct

Displayed time zone: Lisbon change

14:00 - 15:30
compilation and optimization 1OOPSLA at Room II
Chair(s): Will Crichton Brown University
14:00
18m
Talk
Formally Verifying Optimizations with Block Simulations
OOPSLA
Léo Gourdin Université Grenoble Alpes - CNRS - Grenoble INP - Verimag, Benjamin Bonneau Université Grenoble Alpes - CNRS - Grenoble INP - Verimag, Sylvain Boulmé Université Grenoble Alpes - CNRS - Grenoble INP - Verimag, David Monniaux Université Grenoble Alpes - CNRS - Grenoble INP - Verimag, Alexandre Bérard Université Grenoble Alpes - CNRS - Grenoble INP - Verimag
DOI Pre-print
14:18
18m
Talk
Back to Direct Style: Typed and Tight
OOPSLA
Marius Müller University of Tübingen, Philipp Schuster University of Tübingen, Jonathan Immanuel Brachthäuser University of Tübingen, Klaus Ostermann University of Tübingen
DOI Pre-print
14:36
18m
Talk
Hardware-Aware Static Optimization of Hyperdimensional Computations
OOPSLA
Pu (Luke) Yi Stanford University, Sara Achour Stanford University
DOI
14:54
18m
Talk
Rapid: Region-Based Pointer Disambiguation
OOPSLA
Khushboo Chitre IIIT Delhi, Piyus Kedia IIIT Delhi, Rahul Purandare University of Nebraska-Lincoln
DOI
15:12
18m
Talk
Automated Ambiguity Detection in Layout-Sensitive Grammars
OOPSLA
Jiangyi Liu Tsinghua University, Fengmin Zhu CISPA - Helmholtz Center for Information Security, Fei He Tsinghua University
DOI Pre-print