Sponsoring OrganizationsIMACS LogoMATCOM

Numerical Reproducibility at Exascale: NRE2016

Synopsis

Where: Part of SC16, Salt Lake City, UT
When: Friday morning, Nov 18, 2016
Submit: https://easychair.org/conferences/?conf=nre2016
Deadline: Monday, September 26, 2016 (was Aug. 29)
Notifications: Monday, October 10, 2016 (was Sep 19)
Full Papers: Monday, December 19, 2016
Organized by: Walid Keyrouz (NIST) and Michael Mascagni (FSU & NIST)
Registration: handled by SC16 (http://sc16.supercomputing.org/)

Motivation and Previous Offerings

This is the second offering of Numerical Reproducibility at Exascale, the first edition was at SC15, and it's webpage can be found here.

A cornerstone of the scientific method is experimental reproducibility. As computation has grown into a powerful tool for scientific inquiry, the assumption of computational reproducibility has been at the heart of numerical analysis in support of scientific computing. With ordinary CPUs, supporting a single, serial, computation, the ability to document a numerical result has been a straight-forward process. However, as computer hardware continues to develop, it is becoming harder to ensure computational reproducibility, or to even completely document a given computation. This workshop will explore the current state of computational reproducibility in HPC, and will seek to organize solutions at different levels. The workshop will conclude with a panel discussion aimed at defining the current state of computational reproducibility for the Exascale.

Agenda

There will be two plenary speakers (30 min each) and 4 contributed talks (15 min presentation + 10 min Q&A). The agenda is shown below.

The workshop will be held in 155-C.

Start Speaker Title
8:30 am   Opening + Admin Issues (agenda)
8:35 am Michela Taufer (U. of Delaware) The Three R's of Work in Scientific Papers: Repeatability, Replicability and Reproducibility (slides)
9:05 am Michael Wolfe (PGI) A Compiler View of Reproducibility (slides)
9:35 am N. Burgess and D. Lutz, ARM Ltd. High-Precision Anchored Accumulators for Reproducible Floating-Point Summation (slides, ARITH24 CFP)
10:00 am
Coffee Break
10:30 am M. Bayati, et. al, Northeastern U. Identifying Volatile Numeric Expressions in OpenCL Applications (slides)
10:55 am H. Ji, et al., ODU and FSU Comparison of Deterministic and Stochastic Measures of Numerical Reproducibility (slides)
11:20 am R. Iakymchuk, et al., KTH, Paris VI, Perpignan, U. Jaime I Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs (slides)
11:45 am I. Jimenez, et al., UC Santa Cruz, SNL, LLNL, UW-M The Popper Convention: Practical Reproducible Evaluation of Systems (slides)
11:50 am D. Ahn et al., LLNL, Univ. of Utah, RWTH Aachen U. Coming Soon | A Toolset to Fight Non- deterministic Bugs at Scale (slides)
11:55 am
Panel Discussion

Call for Participation

Experimental reproducibility is a cornerstone of the scientific method. As computing has grown into a powerful tool for scientific inquiry, computational reproducibility has been one of the core assumptions underlying scientific computing. With "traditional" single-core CPUs, documenting a numerical result was relatively straightforward. However, hardware developments over the past several decades have made it almost impossible to ensure computational reproducibility or to even fully document a computation without incurring a severe loss of performance. This loss of reproducibility started with CPUs that used out-of-order execution to improve performance. It has accelerated with recent architectural trends towards platforms with increasingly large numbers of processing elements, namely multicore CPUs and compute accelerators (GPUs, Intel Xeon Phi, FPGAs).

Programmers targeting these platforms rely on tools and libraries to produce codes or execute them efficiently. As a result, codes can run efficiently, but have execution details that can be impossible to predict and are often very difficult to understand after execution. Furthermore, parallel implementations often result in code with varying execution orders between runs, leading to nonreproducible computations. The underlying reasons are that (1) the hardware and system software allocate parallel work in ways that are not always specifiable at compile time and (2) the execution often proceeds in an opportunistic manner with the execution order changing between runs. As such, floating-point computations, which are non-commutative, can have different execution orders and execute on different processing elements between runs, leading to runs with varying results as a matter of fact. The predictability of systems is further complicated by two issues that are becoming more critical as systems grow in scale: (1) interconnect systems with latencies that are often outside the control of programmers and (2) reliability as the mean time between failure (MTBF) is now measured in hours on large systems. This situation seriously affects the ability to rely on scientific computations as a metrological substitute for experimentation!

Workshop Scope

The workshop is meant to address the scope of the problems of numerical reproducibility in HPC in general and those anticipated as we scale to Exascale machines in the next decade. We initially seek contributions of extended abstracts (two pages) in the areas of computational reproducibility in HPC from academic, government, and industry stakeholders. Areas of interest include, but are not limited to:

Workshop Format

The workshop will have:

Papers submitted to the workshop will be refereed. The referees will select the papers that will be presented in the workshop. In addition, a group of papers will be published in a special issue of Mathematics and Computers in Simulation (MATCOM) devoted to Numerical Reproducibility.

Submissions

Submissions of two page extended abstracts are sought. The format for the abstracts is not specified, but full papers that are accepted will be published in the MATCOM special issue. The MATCOM instructions for authors can be found here.

The abstracts are to submitted as a PDF document using Easychair at https://easychair.org/conferences/?conf=nre2016

Travel Support

Some limited travel support may be available via NIST.

Important Dates (all are Mondays)

Organizers and Co-Editors of the MATCOM Special Issue

Scientific Committee Committee

Contact

E-mail: numerical.reproducibity.at.nist.gov (replace ".at." by "@")