Sponsoring Organizations

Computational Reproducibility at Exascale: CRE2017

Synopsis

Where: Part of SC17, Denver, CO
When: Sunday afternoon, Nov 12, 2017
Submit: https://easychair.org/conferences/?conf=cre2017
Deadline: Friday, September 15, 2017
Notifications: Monday, October 2, 2017
Full Papers: Monday, October 9, 2017
Organized by: Walid Keyrouz (NIST), Miriam Leeser (NEU), and Michael Mascagni (FSU & NIST)
Registration: handled by SC17 (http://sc17.supercomputing.org/)

Motivation and Previous Offerings

This workshop combines the Numerical Reproducibility at Exascale Workshops (conducted in 2015 and 2016 at SC) and the panel on Reproducibility held at SC'16 (originally a BOF at SC'15) to address several different issues in reproducibility that arise when computing at exascale. The workshop will include issues of numerical reproducibility as well as approaches and best practices to sharing and running code and the reproducible dissemination of computational results. The workshop is meant to address the scope of the problems of computational reproducibility in HPC in general, and those anticipated as we scale up to Exascale machines in the next decade. The participants of this workshop will include government, academic, and industry stakeholders; the goals of this workshop are to understand the current state of the problems that arise, what work is being done to deal with this issues, and what the community thinks the possible approaches to these problem are.

Agenda

There will be two plenary speakers (25 min + 5 min Q&A each), 4 contributed talks (15 min presentation + 5 min Q&A), and a 40 min panel. The agenda is shown below.

Start Speaker Speaker/Title
2:00 pm John Gustafson (A*STAR, NUS), TBD Plenary I
2:30 pm K. Sato, I. Laguna, G. Lee, M. Schulz, C. Chambreau, D. Ahn, S. Atzeni, M.Bentley, G. Gopalakrishnan, Z. Rakamaric, G. Sawaya, J.Protze
LLNL, Univ. of Utah, RWTH Aachen Univ.
PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers
2:50 pm S. Mahajan, K. Evans, J. Kennedy, M. Xu, M. Norman, M. Branstetter
ORNL
Solution Reproducibility of Earth System Models Continually Adapting for Exascale Computing
3:10 pm R. Iakymchuk, S. Graillat, E. Laure, E. Quintana-Ortì
KTH, Sorbonnes Univ., Univ. Jaime I
Towards a Reproducible Solution of Linear Systems
3:30 pm Break Food provided by SC
4:00 pm J. Demmel (UC Berkeley), P. Ahrens (MIT), H.D. Nguyen (UC Berkeley)
Plenary II: Reproducible Floating Point Summation and the BLAS
4:30 pm L. Pouchard, S. Baldwin, T. Elsethaggen, C. Gamboa, S. Jha, B. Raju, E. Stephan, L. Tang, K. Kleese Van Dam
BNL, LLNL, PNNL
Use Cases of Computational Reproducibility for Scientific Workflows at Exascale
4:50 pm D. Bailey, L. Barba, N. Burgess, B. Debusschere, G. Gopalakrishnan
UC Davis, GWU, Arm, SNL, U. Utah
40 min Panel Discussion on Computational Reproducibility

Call for Participation

Experimental reproducibility is a cornerstone of the scientific method. As computing has grown into a powerful tool for scientific inquiry, computational reproducibility has been one of the core assumptions underlying scientific computing. With "traditional" single-core CPUs, documenting a numerical result was relatively straightforward. However, hardware developments over the past several decades have made it almost impossible to ensure computational reproducibility or to even fully document a computation without incurring a severe loss of performance. This loss of reproducibility started when systems combined parallelism (e.g., clusters) with non-determinism (e.g., single-core CPUs with out-of-order execution). It has accelerated with recent architectural trends towards platforms with increasingly large numbers of processing elements, namely multicore CPUs and compute accelerators (GPUs, Intel Xeon Phi, FPGAs).

Programmers targeting these platforms rely on tools and libraries to produce codes or execute them efficiently. As a result, codes can run efficiently, but have execution details that can be impossible to predict and are often very difficult to understand after execution. Furthermore, parallel implementations often result in code with varying execution orders between runs, leading to nonreproducible computations. The underlying reasons are that (1) the hardware and system software allocate parallel work in ways that are not always specifiable at compile time and (2) the execution often proceeds in an opportunistic manner with the execution order changing between runs. As such, floating-point computations, which are non-commutative, can have different execution orders and execute on different processing elements between runs, leading to runs with varying results as a matter of fact. The predictability of systems is further complicated by two issues that are becoming more critical as systems grow in scale: (1) interconnect systems with latencies that are often outside the control of programmers and (2) reliability as the mean time between failure (MTBF) is now measured in hours on large systems. This situation seriously affects the ability to rely on scientific computations as a metrological substitute for experimentation!

Workshop Scope

The workshop is meant to address the scope of the problems of numerical and computational reproducibility in HPC in general and those anticipated as we scale to Exascale machines in the next decade. We initially seek contributions of extended abstracts (two pages) in the areas of computational reproducibility in HPC from academic, government, and industry stakeholders. Areas of interest include, but are not limited to:

Workshop Format

The workshop will have:

Papers submitted to the workshop will be refereed. The referees will select the papers that will be presented in the workshop. In addition, a group of papers will be published in a special issue of a TBD journal devoted to Computational Reproducibility.

Submissions

Submissions of two page extended abstracts are sought. The format for the abstracts should follow the IEEE Conference Proceedings format. Templates are available at IEEE - Manuscript Templates for Conference Proceedings. The full papers must be in the format of the International Journal of High Performance Computing Application.  Full papers are limited to 12 pages in length including figures and the bibliography.

The abstracts are to submitted as a PDF document using Easychair at https://easychair.org/conferences/?conf=cre2017

The authors accepted abstracts will be notified and the manuscripts solicited for a special issue of the International Journal of High Performance Computing Application.  The organizers are the editors of this journal, and when solicited, authors will be sent instructions for submitting their manuscripts via the IJHPCA website for the special issue.  Special instructions will be sent, and so please wait for those or your manuscript may be incorrectly routed at the IJHPCA.

Important Dates

Organizers and Co-Editors of the Special Issue

A special issue of the International Journal of High Performance Computing Application

Scientific Committee

Contact

E-mail: numerical.reproducibility.at.nist.gov (replace ".at." by "@")