Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed Heaps (MPLR 2023)

Sun 22 - Fri 27 October 2023 Cascais, Portugal

Who

Juan Fumero, Florin Blanaru, Athanasios Stratikopoulos, Steve Dohrmann, Sandhya Viswanathan, Christos Kotselidis

Track

MPLR 2023

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 22 Oct 2023 16:22 - 16:45 at Room II - MPLR Session 4 Chair(s): Stefan Marr

Abstract

Adopting heterogeneous execution on GPUs and FPGAs in managed runtime systems, such as Java, is a challenging task due to the complexities of the underlying virtual machine. The majority of the current work has been focusing on compiler toolchains to solve the challenge of transparent just-in-time compilation of different code segments onto the accelerators. However, apart from providing automatic code generation, another paramount challenge is the seamless interoperability between the host memory manager and the Garbage Collector (GC). Currently, heterogeneous programming models that run on top of managed runtime systems, such as Aparapi and TornadoVM, need to block the GC when running native code (e.g, JNI code) in order to prevent the GC from moving data while the native code is still running on the hardware accelerator.

To tackle the inefficacy of locking the GC while the GPU operates, this paper proposes a novel Unified Memory (UM) memory allocator for heterogeneous programming frameworks for managed runtime systems. In this paper, we show how, by providing small changes to a Java runtime system, automatic memory management can be enhanced to perform object reclamation not only on the host, but also on the device. This is done by allocating the Java Virtual Machine's object heap in unified memory which is visible to all hardware accelerators. In this manner -although explicit data synchronization between the host and the device is still required to ensure data consistency- we enable transparent page migration of Java heap-allocated objects between the host and the accelerator, since our UM system is aware of pointers and object migration due to GC collections. This technique has been implemented in the context of MaxineVM, an open source research VM for Java written in Java. We evaluated our approach on a discrete and an integrated GPU, showcasing under which conditions UM can benefit execution across different benchmarks and configurations.We concluded that when hardware acceleration is not employed, UM does not pose significant overheads unless memory intensive workloads are encountered which can exhibit up to 12% (worst case) and 2% (average) slowdowns. In addition, if hardware acceleration is used, UM can achieve up to 9.3x speedup compared to the non-UM baseline implementation for integrated GPUs.

Link to Preprint

https://github.com/jjfumero/jjfumero.github.io/blob/master/files/papers/2023/jfumero-mplr2023-unified-memory.pdf

DOI

https://doi.org/10.1145/3617651.3622984

Juan Fumero

University of Manchester

United Kingdom

Florin Blanaru

Axelera AI

Netherlands

Athanasios Stratikopoulos

University of Manchester

United Kingdom

Steve Dohrmann

Intel

United States

Sandhya Viswanathan

Intel

United States

Christos Kotselidis

University of Manchester

United Kingdom

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 22 Oct
Displayed time zone: Lisbon change

16:00 - 17:30	MPLR Session 4MPLR at Room II Chair(s): Stefan Marr University of Kent

16:00 22m Talk		Comparing Rapid Type Analysis with Points-To Analysis in GraalVM Native Image MPLR David Kozak Brno University of Technology, Vojin Jovanovic Oracle Labs, Codrut Stancu Oracle Labs, Tomáš Vojnar Brno University of Technology, Christian Wimmer Oracle Labs DOI
16:22 23m Talk		Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed Heaps MPLR Juan Fumero University of Manchester, Florin Blanaru Axelera AI, Athanasios Stratikopoulos University of Manchester, Steve Dohrmann Intel, Sandhya Viswanathan Intel, Christos Kotselidis University of Manchester DOI Pre-print
16:45 15m Talk		Beyond RSS: Towards Intelligent Dynamic Memory Management (Work in Progress) MPLR Christos Lamprakos National Technical University of Athens; KU Leuven, Sotirios Xydis National Technical University of Athens, Peter Kourzanov IMEC, Manu Perumkunnil IMEC, Francky Catthoor IMEC; KU Leuven, Dimitrios Soudris National Technical University of Athens DOI
17:00 15m Talk		Towards Safe HPC: Productivity and Performance via Rust Interfaces for a Distributed C++ Actors Library (Work in Progress) MPLR John Parrish Georgia Institute of Technology, Nicole Wren Block; Georgia Institute of Technology, Tsz Hang Kiang Georgia Institute of Technology, Akihiro Hayashi Georgia Institute of Technology, Jeffrey Young Georgia Institute of Technology, Vivek Sarkar Georgia Institute of Technology DOI
17:15 15m Talk		Generating Java Interfaces for Accessing Foreign Objects in GraalVM (Work in Progress) MPLR Julian Garn JKU Linz, Florian Angerer Oracle Labs, Hanspeter Mössenböck JKU Linz DOI