Call for Participation
Numerical Reproducibility at Exascale Workshop (NRE2019)


Where In cooperation with ISC HPC 2019, Frankfurt, Germany
When June 20, 2019: 1400-1800 in the Marriott Hotel
Web http://www.cs.fsu.edu/~nre2019
Submit https://easychair.org/conferences/?conf=nre2019
Deadline Monday, April 25, 2019
Notifications Monday, May 6, 2019
Full Papers TBD by conference publication requirements
Organized byWalid Keyrouz (NIST) and Michael Mascagni (FSU & NIST)

Overview

Experimental reproducibility is a cornerstone of the scientific method. As computing has grown into a powerful tool for scientific inquiry, computational reproducibility has been one of the core assumptions underlying scientific computing. With “traditional” single-core CPUs, documenting a numerical result was relatively straightforward. However, hardware developments over the past several decades have made it almost impossible to ensure computational reproducibility or to even fully document a computation without incurring a severe loss of performance. This loss of reproducibility started when systems combined parallelism (e.g., clusters) with non-determinism (e.g., single-core CPUs with out-of-order execution). It has accelerated with recent architectural trends towards platforms with increasingly large numbers of processing elements, namely multicore CPUs and compute accelerators (GPUs, Intel Xeon Phi, FPGAs).

Programmers targeting these platforms rely on tools and libraries to produce codes or execute them efficiently. As a result, codes can run efficiently, but have execution details that can be impossible to predict and are often very difficult to understand after execution. Furthermore, parallel implementations often result in code with varying execution orders between runs, leading to non-reproducible computations. The underlying reasons are that (1) the hardware and system software allocate parallel work in ways that are not always specifiable at compile time and (2) the execution often proceeds in an opportunistic manner with the execution order changing between runs. As such, floating-point computations, which are not commutative and associative, can have different execution orders and execute on different processing elements between runs, leading to runs with varying results as a matter of fact. The predictability of systems is further complicated by two issues that are becoming more critical as systems grow in scale: (1) interconnect systems with latencies that are often outside the control of programmers and (2) reliability as the mean time between failure (MTBF) is now measured in hours on large systems. This situation seriously affects the ability to rely on scientific computations as a metrological substitute for experimentation!

Previous Offerings

This workshop is the fifth offering in a series that includes the Numerical Reproducibility at Exascale (NRE) workshops (conducted at SC15 and SC16), the panel on Reproducibility held at SC’16 (originally a BOF at SC15), and finally Computational Reproducibility at Exascale (CRE) held at SC17 and SC18. All these addressed several different issues in reproducibility that arise when computing at Exascale. We will also be presenting CRE2019 at SC19 and hope to continue both the NRE series at ISC and CRE at SC.

Workshop Scope

The workshop is meant to address issues of numerical reproducibility as well as approaches and best practices to sharing and running code and the reproducible dissemination of computational results. The workshop is meant to address the scope of the problems of computational reproducibility in HPC in general, and those anticipated as we scale up to Exascale machines in the next decade. The participants of this workshop will include government, academic, and industry stakeholders; the goals of this workshop are to understand the current state of the problems that arise, what work is being done to deal with these issues, and what the community thinks the possible approaches to these problems are.

The workshop is meant to address the scope of the problems of numerical reproducibility in HPC in general and those anticipated as we scale to Exascale machines in the next decade. We initially seek contributions of extended abstracts (two pages) in the areas of computational reproducibility in HPC from academic, government, and industry stakeholders. Areas of interest include, but are not limited to:

Workshop Format

The workshop will have: (1) two 40-minute plenary talks (30 min + 10 min Q&A each), (2) four 25-minute contributed talks (20 min + 5 min Q&A each), and (3) a 45-min panel discussion to summarize the problem, current research, and prospects on long-term solutions. The table below gives the workshop’s schedule:

Start

Speaker

Title




2:00 pm

Michael Mascagni (FSU & NIST, US)

NRE2019 Program (slides)

2:01 pm

Thorsten Hoefler (ETH, CH)

Performance Reproducibility in HPC and Deep Learning (slides)

2:40 pm

N. Bombace & M. Weiland (EPCC, UK)

A Study on the Performance of Reproducible Computations (slides)

3:05 pm

Thomas Ludwig (DKRZ, D)

Bitwise Reproducibility with Exascale Machines (slides)

3:30 pm

Coffee break

Provided by conference

4:00 pm

David R. C. Hill (ISIMA, F)

Reproducibility of Parallel Stochastic Simulations: Enabling Parallel and Sequential Results Comparison Before Scaling on Top Supercomputers (slides)

4:40 pm

B. Lathuilière & F. Févotte (EDF R&D, F)

Verrou: A Tool to Reproduce Floating-Point-Induced Non-Reproducibilities (and help fix them) (slides)

5:05 pm

Michael Mascagni (FSU & NIST, USA)

Three Reproducibility Issues That Can Be Explained As Round-Off Error (slides)

5:30 pm

Panel

Panel Discussion on Numerical Reproducibility

Papers submitted to the workshop will be reviewed. The referees will select the papers that will be presented in the workshop. In addition, a group of papers will be published in the conference proceedings organized by ISC HPC 2019.

Submissions

Submissions of two page extended abstracts are sought. The format for the abstracts should follow the ISC HPC 2019 requirements, which are those for Springer KLNCS proceedings. Full papers will also follow that format. Abstracts are restricted to 2 pages, and full papers to 10 pages.

The abstracts are to be submitted as a PDF document using Easychair at https://easychair.org/conferences/?conf=nre2019.

Important Dates (Mondays)

April 25, 2019:submission deadline for two page abstracts via https://easychair.org/conferences/?conf=nre2019
May 06, 2019: notification of authors about their submissions based on rejection, acceptance as a paper, acceptance as a paper and presentation
TBD: submission deadline for full papers for refereeing

Steering Committee

Contact

E-mail: numerical.reproducibility.at.nist.gov (replace “.at.” by “@”)