SPLASH 2023 - OOPSLA Artifacts

Publish the software that supports your research!

Authors of conditionally-accepted PACMPL(OOPSLA) papers are invited to submit a software artifact that supports the claims in their papers. Per the ACM guidelines for Artifact Review and Badging, OOPSLA provides two types of validation for artifacts as badges that appear on the first page of the paper:

Artifact Available: This badge is for artifacts that are published in a permanent location (with a DOI). Artifacts do not need to be evaluated to receive this badge.
Artifact Evaluated: This badge is for artifacts that have been approved by the OOPSLA Artifact Evaluation Committee (AEC). There are two levels for the badge; papers can receive at most one of them:
1. Functional, for artifacts that adequately support the main scientific claims of the paper
2. Reusable, for artifacts that are Functional and facilitate reuse through careful documentation and clear organization.

Submission is voluntary. Artifact Evaluation is a service provided by the community to help authors of accepted papers extend the reach of their work and encourage future researchers to build on it.

See the Call for Artifacts tab for more information.

Call for Artifacts

Publish the software that supports your research!

Artifact Available: This badge is for artifacts that are published in a permanent location (with a DOI). Artifacts do not need to be evaluated to receive this badge.
Artifact Evaluated: This badge is for artifacts that have been approved by the OOPSLA Artifact Evaluation Committee (AEC). There are two levels for the badge; papers can receive at most one of them:
1. Functional, for artifacts that adequately support the main scientific claims of the paper
2. Reusable, for artifacts that are Functional and facilitate reuse through careful documentation and clear organization.

Submission is voluntary. Artifact Evaluation is a service provided by the community to help authors of accepted papers extend the reach of their work and encourage future researchers to build on it.

Important Dates

OOPSLA Round 1

Jan. 6: Artifact Evaluation submission deadline
Jan 7 – 20: Kick-the-tires period. Authors may communicate with the AEC throughout.
March 1: Evaluation decisions sent (Functional / Reusable)
(at camera-ready) March 10: Artifact Available submission deadline

OOPSLA Round 2

July 14: Artifact Evaluation submission deadline
July 15 – 31: Kick-the-tires period. Authors may communicate with the AEC throughout.
Sep. 1: Evaluation decisions sent (Functional / Reusable)
(at camera-ready) Sep. 10: Artifact Available submission deadline

Artifact Evaluation submission site: https://oopsla23aec.hotcrp.com/u/0/

Artifact Available submissions go through the publisher. They are due with the camera-ready materials for the OOPSLA paper.

Every artifact that passes evaluation (Functional or Reusable) is strongly encouraged to be Available unless there are licensing or privacy concerns about sharing it.

New This Year

We are delighted to promote Software Heritage as a way to host and cite source code.
Communication between authors and AEC members will be open during the entire kick-the-tires period via comments on the submission site. AEC members are encouraged to report issues early so that authors have plenty of time to debug.
Paper proofs are not accepted for evaluation.

Artifact Available

Artifacts that are publicly available in an archival location can earn the Available badge from the publisher. This badge is not controlled by the AEC, which has some important consequences:

Artifacts that were not submitted for evaluation can be Available,
Artifacts that did not pass evaluation can be Available, and
Artifacts that passed evaluation need not be Available to accommodate rare situations in which the authors must keep the artifact private.

The requirements for this badge are set by the publisher and will be provided with the camera-ready instructions for OOPSLA papers. In the past, there have been two primary options for earning the Available badge:

Option 1: Authors upload a snapshot of the artifact to Zenodo to receive a DOI. Uploads can be done manually, or through GitHub.
Option 2: Authors work with Conference Publishing to send their artifact to the ACM for hosting on the ACM DL.

Data-Availability Statement

To help readers find data and software, OOPSLA recommends adding a section just before the references titled Data-Availability Statement. If the paper has an artifact, cite it here. If there is no artifact, this section can explain how to obtain relevant code. The statement does not count toward the OOPSLA 2023 page limit. It may be included in the submitted paper; in fact we encourage this, even if the DOI is not ready yet.

Example:

\section{Conclusion}
....

\section*{Data-Availability Statement}
The software that supports~\cref{s:design,s:evaluation}
is available on Software Heritage~\cite{artifact-swh}
and Zenodo~\cite{artifact-doi}.

\begin{acks}
....

Software Heritage

Software Heritage (SH) is a nonprofit whose mission is to collect, preserve, and share all public code. For authors of OOPSLA papers, SH offers three major services:

Permanent links to directories, files, and code fragments
BibLaTeX styles for citing software
Automatic crawling of source code repositories for updates

For more information, read the Software Heritage HOWTO and FAQ guides. See also the browser extension and GitHub action for archiving code.

Two caveats: (1) the ACM does not yet accept SH permalinks for the Available badge, only DOIs; and (2) we recommend packaging source code artifacts with Docker (or a similar build tool) to avoid dependency issues.

Artifact Evaluated

The rest of this Call explains the AEC process for determining whether to award an evaluation badge. There are two levels: Functional and Reusable.

Functional: This is the basic “accepted” outcome for an artifact. An artifact can be awarded a Functional badge if the artifact supports all claims made in the paper, possibly excluding some minor claims if there are very good reasons why they cannot be supported.

In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g., benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper. If the artifact claims to outperform a related system in some way (in time, accuracy, etc.) and the other system was used to generate new numbers for the paper (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this as the expected behavior.

Deviations from the ideal must be for good reason. A non-exclusive list of justifiable deviations follows:

Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code). In such cases, the public benchmarks should be included. If all benchmark data for a major claim is private, alternative data should be supplied. Providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
Some of the results are performance data, and therefore exact numbers depend on the particular hardware. In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results. For example, certain optimizations might exhibit a particular trend, or one tool might outperform another in a certain class of inputs.
Repeating the evaluation takes a very long time. If so, provide small and representative inputs to demonstrate the behavior. Reviewers may or may not reproduce the full results in such cases.
The evaluation requires specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs). Authors should contact the chairs as soon as possible to work out how to make these possible to evaluate. In past years, one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviewers could access anonymously.

Reusable: A Reusable badge is given when the artifact satisfies the requirements to be functional and is additionally well-packaged, documented, and/or designed to support future research that might build on the artifact. Reusable artifacts should ideally:

explain in detail how the artifact supports the paper,
show how to adapt the artifact to new inputs, and
have client documentation that enables their reuse as a component in another project.

For binary-only artifacts to be considered Reusable, the client documentation is essential. It must be possible for others to directly use the binary in their own research. For example, a JAR artifact should explain how to use the JAR effectively as a component in other projects.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. For an artifact to be accepted, it must support the main claims made in the paper. Thus, in addition to just running the artifact, the evaluators will read the paper and may try to tweak provided inputs or otherwise slightly generalize the use of the artifact from the paper in order to test the artifact’s limits.

In general, artifacts should be:

consistent with the paper,
as complete as possible,
well documented, and
easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how would this artifact help me to reproduce the results and build on them?

Submission Process

All conditionally-accepted OOPSLA papers are eligible to submit artifacts.

Submissions require three parts:

an overview of the artifact,
a non-institutional URL pointing to either:
- a single file containing the artifact (recommended), or
- the address of a public source control repository
A hash certifying the version of the artifact at submission time: either
- an md5 hash of the single file (use the md5 or md5sum command-line tool to generate the hash), or
- the full commit hash for the repository (e.g., from git reflog --no-abbrev)

The URL must be non-institutional to protect the anonymity of reviewers. Acceptable URLs can be obtained from Google Drive, Dropbox, Gitlab, Zenodo, and many other providers. You may upload your artifact directly if it is a single file less than 15 MB.

Artifacts do not need to be anonymous. Reviewers will be aware of author identities.

Overview of the Artifact

The overview should consist of three parts:

a brief introduction,
a Getting Started Guide, and
Step-by-Step Instructions for how you propose to evaluate your artifact (with appropriate connections to the relevant sections of your paper).

In the introduction, briefly explain the purpose of the artifact and how it supports the paper. We recommend listing all claims in the paper and stating whether or not each is supported. For supported claims, say how the artifact provides support. For unsupported claims, explain why they are omitted.

In the Getting Started Guide, give instructions for setup and basic testing. List any software requirements and/or passwords needed to access the artifact. The instructions should take roughly 30 minutes to complete. Reviewers will follow the guide during an initial kick-the-tires phase and report issues as they arise.

The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

In the Step by Step Instructions, explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out, note how long it is expected to run (roughly) and explain how to run it on smaller inputs. Reviewers may choose to run on smaller inputs or larger inputs depending on available resources.

Be sure to explain the expected outputs produced by the Step by Step Instructions. State where to find the outputs and how to interpret them relative to the paper. If there are any expected warnings or error messages, explain those as well. Ideally, artifacts should include sample outputs and logs for comparison.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members have a limited time in which to make an assessment of each artifact.

A good way to package artifacts is as a virtual machine (VM). VMs give an easily reproducible environment that is somewhat resistant to bit rot. They also give reviewers confidence that errors or other problems cannot cause harm to their machines. The major downside of VMs is that they rely on x86 hardware, which means that Apple machines with M1 or M2 chips cannot use them.
Source code artifacts should use Docker or another build tool to manage all compilation and dependencies. This improves the odds that the reviewers will be able to quickly and painlessly install the artifact — without getting lost in environment issues (e.g. what Python do I need?!).
Mechanized proof artifacts should follow the guidelines on this page: Proof Artifacts (accessed 2022-09-25). Be sure to explain how the mechanization encodes concepts and theorems from the paper. In our experience, it is difficult for a mechanized artifact to satisfy the requirements for Functional alone without also being Reusable because documentation is crucial for understanding whether the artifact faithfully supports the paper.

Submit your artifact as a single archive file and use the naming convention `<paper #>.``, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents (such as .txt, .html, and .pdf).

Based on the outcome of the previous editions (2019 AEC, 2020 AEC, 2021 AEC), the strongest recommendation we can give for ensuring quality packaging is to test your own directions on a fresh machine (or VM), following exactly the directions you have prepared.

While publicly available artifacts are often easier to review, and considered to be in the best interest of open science, artifacts are not required to be public and/or open source. The submission site will ask whether the artifact is private. Artifact reviewers will be instructed that such artifacts are for use only for artifact evaluation, that submitted versions of artifacts may not be made public by reviewers, and that copies of artifacts must not be kept beyond the review period.

Review Process Overview

Kick-the-tires

After submitting their artifact, there is a short window of time in which the reviewers will work through only the Getting Started instructions, and upload preliminary reviews indicating whether or not they were able to get those 30-or-so minutes of instructions working. The preliminary reviews will be shared with authors immediately, who may make modest updates and corrections in order to resolve any issues the reviewers encountered.

Additional rounds of interaction are allowed via comments throughout the initial kick-the-tires period. Our goal here is twofold: we want to give authors the opportunity to resolve issues early (before other reviewers rediscover them), and we want authors to have as much time as possible for debugging (more than the typical 3-day response window).

Full review

During the full review period, comments are closed by default but may be reopened at reviewers’ discretion to debug small issues. The purpose of re-opening communication is to maximize the number of Functional submissions.

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the two AEC chairs are handled by the other AEC chair, or the PC of the conference if both chairs are conflicted. Artifacts involving an AEC chair must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

Common issues

In the kick-the-tires phase

Overstating platform support. Several artifacts claiming the need for only UNIX-like systems failed severely under macOS — in particular those requiring 32-bit compilers, which are no longer present in newer macOS versions. We recommend future artifacts scope their claimed support more narrowly.
Missing dependencies, or poor documentation of dependencies.

The most effective way to avoid these sorts of issues ahead of time is to run the instructions independently on a fresh machine, VM, or Docker container.

In the full review phase

Comparing against existing tools on new benchmarks, but not including ways to reproduce the other tools’ executions.
Not explaining how to interpret results. Several artifacts ran successfully and produced the output that was the basis for the paper, but without any way for reviewers to compare these for consistency with the paper. Examples included generating a list of warnings without documenting which were true vs. false positives, and generating large tables of numbers that were presented graphically in the paper without providing a way to generate analogous visualizations.

FAQ

Q. My artifact requires hundreds of GB of RAM / hundreds of CPU hours / a specialized GPU / etc., that the AEC members may not have access to. How can we submit an artifact?: If the tool can run on an average modern machine, but may run extremely slow in comparison to the hardware used for the paper's evaluation, document the expected running time and point to examples the AEC may be able to replicate in less time. If your system will simply not work at all without hundreds of GB or RAM, or other hardware requirements that typical machines will not satisfy, please contact the AEC chairs in advance to make arrangements. One option is to get suitable hardware from a cloud provider (for example, Cloudlab), and give reviewers anonymous access. (The AEC chairs will coordinate with reviewers to decide when the cloud reservation needs to be active.) Submissions using cloud instances or similar that are not cleared with the AEC Chairs in advance will be summarily rejected
Q. Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?: In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact with a working tool but no benchmarks (because they are closed-source) would be rejected. In this case, alternate benchmarks should be provided.
Q. Why do we need a DOI for the Available badge? Why not a Github or institutional URL?: A DOI is a strong assurance that the artifact will remain available indefinitely. By contrast, Github URLs are not permanent: it is possible to rewrite git commit history in a public repository (using rebase and --force, for example), users can delete public repositories, and Github itself might disappear like Google Code did (2015). Institutional URLs may also move and change over time.
Q. Reviewers identified things to fix in documentation or scripts for our artifact, and we'd prefer to publish the fixed version. Can we submit the improved version for the Available badge?: Yes.
Q. Can I get the Available badge without submitting an artifact? I'm still making the thing available!: Yes.
Q. Can I get the Available badge for an artifact that was not judged to be Functional? I'm still making the thing available!: Yes.
Q. Why doesn't the AEC accept paper proofs?: The AEC process is designed for software, not to provide rigorous evaluation of paper proofs. Its main strengths are checking that an artifact runs successfully and has a clear relation to the paper --- neither of which are serious issues for paper proofs. Authors should submit such proofs as supplementary material rather than as artifacts.

Contact

Please contact the AEC chairs Ben Greenman and Guillaume Baudart if you have any questions.

Slides

This year marked the second combined AEC + ERC (Artifact Evaluation + External Review Committee). Every AEC member reviewed papers for OOPSLA. When these papers advanced to the artifact stage, an ERC reviewer served as the lead AEC reviewer and led the discussions on what the artifact should include to support the paper.

Recruiting for the AEC+ERC went smoothly. The overlap between ERC work and AEC work helped a bit to streamline the process of finding claims that the artifact should support, but not substantially. Perhaps a better way is for PC members to agree on expectations for an artifact and forward those to the AEC.

This year was also the second year of ACM badges rather than SIGPLAN badges. We offered one badge (“Artifact Evaluated”) with two levels: Functional and Reusable.

The new style was a source of friction because the SIGPLAN badges, which many reviewers were familiar with and liked, considered functionality and reusability as two separate dimensions. The main question was: should Functional be awarded only to artifacts that 100% validate claims from the paper? If so, then well-packaged artifacts could be excluded from a (deserved) Reusable badge if they happened to include a tool with unreliable running times and the paper did not communicate the unreliability. Similarly, it is very difficult to audit a mechanized proof if the codebase is not documented to the Reusable level. As chairs we tended toward a lenient Functional badge, but this was suboptimal because it loses a quality measure that the SIGPLAN badge captured. For future years, we recommend that OOPSLA offers the ACM Reproduced badge (but not the Replicated badge) for artifacts that support the paper at the 100% level.

Results Overview

The AEC received 81 submissions total, split between 21 in round 1 and 60 in round 2. This is similar to last year. - 44 artifacts received the Reusable badge (54%) - 29 received Functional (36%) - 8 did not receive a badge (10%)

With a few exceptions, each artifact received three reviews. The AEC commented among themselves and occasionally with the authors to decide borderline artifacts.

Two common problems among no-badge artifacts were unclear instructions and lack of a way to run small examples. In some cases, the AEC was unable to run artifacts with vague instructions. We attempted to mitigate such problems by allowing open communications throughout the 2-week kick-the-tires period and case-by-case during the month-long review period, but some authors were unresponsive. Lack of small examples was a problem for artifacts with heavy machine requirements. Authors should provide access to suitable machines and give the AEC (and future readers!) small examples to test on.

A fixable issue among Functional-but-not-Reusable artifacts is that it is not always clear how to evaluate reusability. A software library vs. a GUI app vs. a mechanized proof vs. an data analysis script may have different, reasonable requirements. Furthermore, an artifact may include some core components that should be reusable and some auxiliaries that are not necessary to build on the research. Since authors know their work best, we recommend they suggest reusability criteria in the artifact documentation (the Call already asks for step-by-step Functionality instructions).

Recommendations

For Authors: be sure that artifacts can run small examples in addition to full experiments. Strive to output figures/tables similar to the paper instead of raw text.
Have PC members set expectations for artifacts to save AEC members the work of combing through papers to uncover claims. The AEC focus should be on checking and discussing claims, not finding them.
Add the ACM Reproduced badge to OOPSLA. Follow the ACM requirements for the Functional badge rather than the stricter function + reproduced requirements from the SIGPLAN badge era.
Have authors explain what aspects of the artifact do and do not support reusability, just as authors already explain what aspects of functionality the artifact does and does not cover.

Distinguished Artifacts

The following artifacts received at least one nomination from an AEC reviewer and we the chairs agree that these artifacts are exceptionally high quality:

Proof Automation for Linearizability in Separation Logic
- Ike Mulder, Robbert Krebbers
The Essence of Verilog: A Tractable and Tested Operational Semantics for Verilog
- Qinlin Chen, Nairen Zhang, Jinpeng Wang, Tian Tan, Chang Xu, Xiaoxing Ma, Yue Li
A Deductive Verification Infrastructure for Probabilistic Programs
- Philipp Schröer, Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja
Validating IoT Devices with Rate-Based Session Types
- Grant Iraci, Cheng-En Chuang, Raymond Hu, Lukasz Ziarek

Distinguished Artifact Reviewers

Many AEC+ERC members wrote consistently excellent reviews and discussion comments. Our two distinguished reviewers gave excellent feedback in spite of major roadblocks:

Rob Sison
- UNSW Sydney
- Rob’s detailed reviews included several artifacts outside their research area in which their feedback was critical to reach a fair decision.
Shiwei Weng
- Johns Hopkins University
- Shiwei, after much trial and error, managed to find access to a critical hardware component for an artifact.

OOPSLA ArtifactsSPLASH 2023

Call for Artifacts

Chair's Report

Ben GreenmanArtifact Evaluation Co-Chair

Brown University, USA

Guillaume BaudartArtifact Evaluation Co-Chair

Inria

France

Jenna DiVincenzo (Wise)

Purdue University

Emanuele D’Osualdo

MPI-SWS

Germany

Andrew K. Hirsch

University at Buffalo, SUNY

United States

Victor Nicolet

Amazon Web Services

Canada

Shiwei Weng

Johns Hopkins University

United States

Sankha Narayan Guria

University of Kansas

United States

Philipp Schuster

University of Tübingen

Germany

Daming Zou

ETH Zurich

Switzerland

Oliver Bračevac

Galois, Inc.

United States

Hendrik van Antwerpen

Delft University of Technology

Netherlands

Rachit Nigam

Cornell University

United States

Alex Reinking

UC Berkeley

United States

Kartik Singhal

University of Chicago

United States

Benjamin Chung

Northeastern University

Jialu Zhang

Yale

Konstantinos Kallas

University of Pennsylvania

United States

Ali Ghanbari

Iowa State University

United States

Paul Gazzillo

University of Central Florida

United States

Liyi Li

University of Maryland

United States

Will Crichton

Brown University

United States

Kesha Hietala

Amazon Web Services

United States

Dan Barowy

Williams College

United States

Ashish Mishra

Purdue University

United States

Mae Milano

University of California at Berkeley

United States

Lelio Brun

National Institute of Informatics

Japan