A paper consists of a constellation of artifacts that extend beyond the document itself: software, mechanized proofs, models, test suites, benchmarks, and so on. In some cases, the quality of these artifacts is as important as that of the document itself, yet our conferences offer no formal means to submit and evaluate anything but the paper. To address this, POPL has run an optional artifact evaluation process since POPL 2015, inspired by similar efforts in our community.
The goal of the artifact evaluation process is two-fold: to both reward and probe. Our primary goal is to reward authors who take the trouble to create useful artifacts beyond the paper. Sometimes the software tools that accompany the paper take years to build; in many such cases, authors who go to this trouble should be rewarded for setting high standards and creating systems that others in the community can build on. Conversely, authors sometimes take liberties in describing the status of their artifacts—claims they would temper if they knew the artifacts are going to be scrutinized. This leads to more accurate reporting.
Our hope is that eventually, the assessment of a paper’s accompanying artifacts will guide the decision-making about papers: that is, the AEC would inform and advise the Program Committee (PC). This would, however, represent a radical shift in our conference evaluation processes; we would rather proceed gradually. Thus, in our process, artifact evaluation is optional, and authors choose to undergo evaluation only after their paper has been accepted. Nonetheless, feedback from the Artifact Evaluation Committee can help improve the both the final version of the paper and any publicly released artifacts.
The evaluation criteria are ultimately simple. A paper sets up certain expectations of its artifacts based on its content. The AEC will read the paper and then judge how well the artifact matches these criteria. Thus the AEC’s decision will be that the artifact does or does not “conform to the expectations set by the paper”. Ultimately, we expect artifacts to be:
- consistent with the paper,
- as complete as possible,
- documented well, and
- easy to reuse, facilitating further research.
We believe the dissemination of artifacts benefits our science and engineering as a whole. Their availability improves replicability and reproducibility, and enables authors to build on top of each others’ work. It can also help more unambiguously resolve questions about cases not considered by the original authors.
Beyond helping the community as a whole, it confers several direct and indirect benefits to the authors themselves. The most direct benefit is, of course, the recognition that the authors accrue. But the very act of creating a bundle that can be used by the AEC confers several benefits:
The same bundle can be distributed to third-parties.
A bundle can be used subsequently for later experiments (e.g., on new parameters).
The bundle simplifies having to re-run the system subsequently when, say, having to respond to a journal reviewer’s questions.
The bundle is more likely to survive being put in storage between the departure of one student and the arrival of the next.
However, creating a bundle that meets all these properties can be onerous. Therefore, the process we describe below does not require an artifact to have all these properties. It offers a route to evaluation that confers fewer benefits for vastly less effort.
To maintain a wall of separation between paper review and the artifacts, authors will be asked to upload their artifacts only after their papers have been accepted. Of course, they can (and should!) prepare their artifacts well in advance, and can provide the artifacts to the PC via supplemental materials, as many authors already do.
The authors of all accepted papers will be asked whether they intend to have their artifact evaluated and, if so, to upload the artifact. They are welcome to indicate that they do not.
After artifact submission, the AEC will download and install the artifact (where relevant), and evaluate it. Since we anticipate small glitches with installation and use, the AEC Chairs may communicate with authors to help resolve glitches. The AEC will complete its evaluation and notify authors of the outcome. There is approximately one week between feedback from the AEC and the deadline for the final versions of accepted papers. This is intended to allow authors sufficient time to include the feedback from the AEC as they deem fit.
The PC Chair’s report will include a discussion of the artifact evaluation process. Papers with artifacts that “meet expectations” may indicate that they do with the following badge (courtesy Matthias Hauswirth):
This year, we will also use ACM’s badges to mark papers in the ACM Digital Library. Papers that pass artifact evaluation will also receive ACM’s “Artifacts Evaluated - Reusable” badge. Papers that pass artifact evaluation and where the authors also make their artifacts publicly available eternally (e.g. on GitHub or ACM DL) will additionally receive ACM’s “Artifacts Available” badge.
To avoid excluding some papers, the AEC will try to accept any artifact that authors wish to submit. These can be software, mechanized proofs, test suites, data sets, and so on. Obviously, the better the artifact is packaged, the more likely the AEC can actually work with it.
Since POPL 2017 the AEC has decided to not accept paper proofs in the artifact evaluation process. The AEC lacks the time and often the expertise to carefully review paper proofs. We hope that reserving the artifact evaluated badge to mechanized proofs that are easy to check and reuse will incentivize more of the POPL authors to mechanize their metatheory in a proof assistant.
While we encourage open research, submission of an artifact does not contain tacit permission to make its content public. AEC members will be instructed that they may not publicize any part of your artifact during or after completing evaluation, nor retain any part of it after evaluation. Thus, you are free to include models, data files, proprietary binaries, etc. in your artifact. Authors of submitted artifacts will be asked if their artifact is publicly available (e.g. on GitHub) and they want to provide an URL to be linked from the POPL AEC website if the artifact is accepted.
We strongly encourage that you anonymize any data files that you submit. We recognize that some artifacts may attempt to perform malicious operations by design. These cases should be boldly and explicitly flagged in detail in the readme so AEC members can take appropriate precautions before installing and running these artifacts.
The AEC will consist of about 20-25 members. We intend for other members to be a combination of senior graduate students, postdocs, and researchers, identified with the help of the POPL Program Committee and External Review Committee.
Qualified graduate students are often in a much better position than many researchers to handle the diversity of systems expectations we will encounter. In addition, these graduate students represent the future of the community, so involving them in this process early will help push this process forward. However, participation in the AEC can provide useful insight into both the value of artifacts, the process of artifact evaluation, and help establish community norms for artifacts. We therefore seek to include a broad cross-section of the POPL community on the AEC.
Naturally, the AEC chairs will devote considerable attention to both mentoring and monitoring the junior members of the AEC, helping to educate the students on their responsibilities and privileges.
Submit an Artifact
After your paper has been accepted, please go to the website to register and submit an artifact.
- Artifact registration deadline: Tuesday, 3 October 2017
- Artifact submission deadline: Friday, 6 October 2017
Information for Committee Members
- Artifact registration (for authors): Tuesday, 3 October 2017
- Artifact bidding deadline (for AEC): Thursday, 5 October 2017
- Artifact submission deadline (for authors): Friday, 6 October 2017
- Reviews due (for AEC): Monday, 23 October 2017
- Artifact decisions announced: Friday, 27 October 2017
- POPL camera-ready deadline: Monday, 30 October 2017
Please don’t leave tasks to the last minute! Please try to install the artifacts early so we have time to contact the authors and troubleshoot if needed. Please submit your reviews early. This will give us more time to read each others’ reviews and understand the relative quality of the artifacts.
Immediately after authors submit the artifacts, please try to install/setup the artifact. Please do this as soon as you can, so we have time to troubleshoot any issues. The AEC Chairs can reach out to the authors to help resolve any technical issues.
Artifact Evaluation Guidelines
Once you have installed the artifact, you can start evaluating it!
Two main things to keep in mind:
Read the paper and write a review that discusses the following: Does the artifact meet the expectations set by the paper?
The paper has already been accepted, so don’t review the paper. Review the artifact.
The only real rubric is what’s on the review form. Every artifact is different and a more fine-grained rubric wouldn’t make sense. This is not a completely objective process and that’s okay. We want to know if the artifact meets your expectations as a researcher. Does something in the artifact annoy you or delight you? You should say so in your review. Note that while the ideal may be replicability (i.e., obtaining the same results using the authors’ artifact), there are many reasons why we as a committee may be unable to replicate the results yet still deem the artifact as meeting expectations. For example, it may be difficult or impossible for the authors to provide a bundled artifact that allows replication.
We encourage you to try your own tests. But, don’t be a true adversary. This is research software so stuff will break. Assume the authors acted in good faith and aren’t trying to hoodwink us. Instead, suppose you had to use/modify the artifact for your own research. Do you think you could? You’ll have to imagine and extrapolate, but that’s okay.
You may find it easier to adjust scores once you’ve reviewed all 2-3 of your artifacts. Moreover, once you read others’ reviews, you’ll get a better sense of average artifact quality. Don’t hesitate to change your scores later.
Finally, if the paper says “we have software/data for X and Y”, but the artifact is only “X” that’s okay. But, it should be crystal clear from the paper that the artifact that was evaluated only did “X”. Say so in your review. Free free to say, “I wish you’d also provided Y as an artifact”, but know that it won’t affect this paper.