SIAM PP20 – The Many Faces of Simulation for HPC Minisymposium

Minisymposium: The Many Faces of Simulation for HPC

Saturday February 15, 2020

Rafael Ferreira da Silva
University of Southern California, U.S.
Frédéric Suter
CNRS, France

Abstract – In the field of HPC research and development, simulation has mainly been used for the purpose of evaluating and comparing the performance of application implementations and of the algorithms therein. While this use remains critical, for good reasons, many other compelling use cases have emerged. These have often been made possible by recent advances in the simulation methodologies at the core of available simulation frameworks. Examples of new areas in which simulation has become a compelling proposition include debugging and verification, application/simulation co-design, and HPC education. In this multi-part mini-symposium, we bring together researchers who have contributed to traditional and explored emerging uses of simulation of HPC systems and applications. The objective is for them to share their experiences, present recent results, identify areas of convergence, and discuss future directions.

Session 1

10:40-11:00 The Many Faces of Simulation for HPC
Frédéric Suter, CNRS, France;
Rafael Ferreira da Silva, University of Southern California, U.S.

11:05-11:25 Teaching Parallel and Distributed Computing Concepts in Simulation
Henri Casanova, University of Hawaii, U.S.

11:30-11:50 Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study
Tom Cornebize, Université Grenoble Alpes, France;
Arnaud Legrand, CNRS, France;
Franz Christian Heinrich, Inria, France

11:55-12:15 Power-Aware Scheduling with Slurm: Simulation and Practice
Tapasya Patki, Lawrence Livermore National Laboratory, U.S.

Session 2

1:50-2:10 Faithful Performance Prediction of a Dynamic Task-Based Runtime System, an Opportunity for Task Graph Scheduling
Samuel Thibault, LaBRI, France;
Luka Stanisic, Inria Bordeaux Sud-Ouest, France;
Arnaud Legrand, CNRS, France;
Brice Videau, INRIA Grenoble Rhône-Alpes, France;
Jean-François Méhaut, Universite Joseph Fourier, France

2:15-2:35 New Horizons for Debugging Long-running Parallel Programs: DMTCP and SimGrid
Gene Cooperman and Rohan Garg, Northeastern University, U.S.

2:40-3:00 Application-simulation co-design for performance and correctness evaluation
Luigi Genovese, CEA, France;
Augustin Degomme, CEA Grenoble

3:05-3:25 To Be Defined


Continue Reading

Bridging Concepts and Practice in eScience via Simulation-driven Engineering

The CyberInfrastructure (CI) has been the object of intensive research and development in the last decade, resulting in a rich set of abstractions and interoperable software implementations that are used in production today for supporting ongoing and breakthrough scientific discoveries. A key challenge is the development of tools and application execution frameworks that are robust in current and emerging CI configurations, and that can anticipate the needs of upcoming CI applications. This paper presents WRENCH, a framework that enables simulation- driven engineering for evaluating and developing CI application execution frameworks. WRENCH provides a set of high- level simulation abstractions that serve as building blocks for developing custom simulators. These abstractions rely on the scalable and accurate simulation models that are provided by the SimGrid simulation framework. Consequently, WRENCH makes it possible to build, with minimum software development effort, simulators that that can accurately and scalably simulate a wide spectrum of large and complex CI scenarios. These simulators can then be used to evaluate and/or compare alternate platform, system, and algorithm designs, so as to drive the development of CI solutions for current and emerging applications.

Simulation-driven engineering life cycle

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, H. Casanova, R. Tanaka, and F. Suter, “Bridging Concepts and Practice in eScience via Simulation-driven Engineering,” in Workshop on Bridging from Concepts to Data and Computation for eScience (BC2DC’19), 15th International Conference on eScience (eScience), 2019, p. 609–614.
    title = {Bridging Concepts and Practice in eScience via Simulation-driven Engineering},
    author = {Ferreira da Silva, Rafael and Casanova, Henri and Tanaka, Ryan and Suter, Frederic},
    booktitle = {Workshop on Bridging from Concepts to Data and Computation for eScience (BC2DC'19), 15th International Conference on eScience (eScience)},
    year = {2019},
    pages = {609--614},
    doi = {10.1109/eScience.2019.00084}


Continue Reading

Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows

Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world scientific workflow applications is still not clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running two data-intensive workflows, a high-throughput genome analysis workflow, and a subset of the SCEC high-performance CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of an order of magnitude, and these improvements lead to increased job performance, and thereby increased overall workflow performance, even for long-running CPU-bound jobs.

I/O Read: performance comparison of read operations with burst- buffers (top) and the PFS (bottom) at NERSC.

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, T. M. A. Do, G. Papadimitriou, and E. Deelman, “Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows,” Future Generation Computer Systems, vol. 101, p. 208–220, 2019.
    title = {Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Do, Tu Mai Anh and Papadimitriou, George and Deelman, Ewa},
    journal = {Future Generation Computer Systems},
    volume = {101},
    number = {},
    pages = {208--220},
    year = {2019},
    doi = {10.1016/j.future.2019.06.016}


Continue Reading

Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows

While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of a widely-used application-level model that have been developed and used in the context of scientific workflow executions. To this end, we profile two production scientific workflows on a distributed platform instrumented with power meters. We then conduct an analysis of power and energy consumption measure- ments. This analysis shows that power consumption is not linearly related to CPU utilization and that I/O operations significantly impact power, and thus energy, consumption. We then propose a power consumption model that accounts for I/O operations, including the impact of wait- ing for these operations to complete, and for concurrent task executions on multi-socket, multi-core compute nodes. We implement our proposed model as part of a simulator that allows us to draw direct comparisons between real-world and modeled power and energy consumption. We find that our model has high accuracy when compared to real-world execu- tions. Furthermore, our model improves accuracy by about two orders of magnitude when compared to the traditional models used in the energy- efficient workflow scheduling literature.

Per-task power (top) and total energy (bottom) consumption measurements for the Epigenomics map task and the SoyKB haplotype caller and indel realign, as well as estimated with traditional methods (estimation) and our proposed model (wrench-*)

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, A. Orgerie, H. Casanova, R. Tanaka, E. Deelman, and F. Suter, “Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows,” in Computational Science – ICCS 2019, 2019, p. 138–152.
    author = {Ferreira da Silva, Rafael and Orgerie, Anne-C\'{e}cile and Casanova, Henri and Tanaka, Ryan and Deelman, Ewa and Suter, Fr\'{e}d\'{e}ric},
    title = {Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows},
    booktitle = {Computational Science -- ICCS 2019},
    year = {2019},
    pages = {138--152},
    publisher = {Springer International Publishing},
    doi = {10.1007/978-3-030-22734-0_11}


Continue Reading

Bridging from Concepts to Data and Computation for eScience — BC2DC’19 @eScience19

The Bridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop will be held in conjunction with eScience’19 on Tuesday September 24, 2019 in San Diego, CA.

Important Dates

  • Papers due: July 10, 2019 July 15, 2019
  • Paper Acceptance Notification: July 24, 2019
  • Camera-ready deadline: July 29, 2019
  • Workshop: September 24, 2019

Workshop Description

How can we enable e-Science developers to conceptualize research and translate it to system requirements?
How should we make such processes understandable, reliable, stable and sustainable? 
How should advances in engineering deliver the expanding power of distributed computation, heterogeneous (cloud and data) platforms and the massive – still rapidly growing – wealth of data?
How can we make it easier for organizations and researchers to engage in multiple research collaborations and to adapt rapidly to changing requirements and new opportunities?

Research addressing global challenges federates a growing diversity of disciplines, requires sustained contributions from many autonomous organizations and builds on heterogeneous evolving computational platforms. Scientific knowledge is scattered across cloud-based services, local storage, and in source code targeting specific architectures and computational contexts. Concepts reflected in disparate sources are hardly computer-communicable and computer-actionable across or even within disciplines. This makes traceability, communication of methods, provenance gathering and reusing data and methods harder and more time-consuming. Agile response to new needs and opportunities may be accelerated when the available methods and required components have mutually comprehensible descriptions. Commercial clouds play an increasingly important role in large-scale scientific experimentation. Examples of diversity in technology and jurisdiction, as well as in the large-scale exploitation of clouds can be found on both sides of the Atlantic: in the European Open Science Cloud (EOSC) as well as in the ongoing massive migration of data and other resources onto Amazon’s AWS by NASA.

It follows that while potential for large-scale data-driven experimentation increases, so does complexity as well as the risk of getting locked into vendor-specific solutions. To deal with these challenges and to help researchers make better and transparent use of diverse infrastructures many systems propose higher-level abstraction to hide and orchestrate infrastructural and implementation details. Domain experts need to directly control sophisticated and dynamic concepts pertaining to data, execution contexts and diverse e-infrastructures. Furthermore, they need mechanisms that allow them to take responsibility for the quality of results, without distracting technological artefacts.

These often take the form of service-based platforms, containerised solutions, APIs, ontological descriptions of underlying resources, provenance repositories, etc. This workshop focuses on platform-driven and domain-specific developments that contribute towards unifying underlying platforms, clouds, data, computational resources and concepts in order to empower research developers to deliver, maintain and communicate larger, increasingly complex eScience systems.

In particular we welcome contributions in the following areas, not excluding other topics of interest:

  • Semantic concept description and implementation 
  • Specification and execution of conceptually formulated methods
  • Component descriptions facilitating reliable composition
  • Architectures, frameworks and design patterns delivering flexible use and incremental composition
  • Cloud, fog, edge and specialized platforms
  • Pervasive and persistent provenance 
  • Platforms of platforms, containers, orchestration and microservices
  • HPC computing over Cloud


Continue Reading

14th Workflows in Support of Large-Scale Science – WORKS 2019 @SC19

The 14th Workflows in Support of Large-Scale Science (WORKS) Workshop will be held in conjunction with SC19 on Sunday November 17, 2019 in Denver, CO.

Important Dates

  • Papers due: July 15, 2019 August 26, 2019 (final extension)
  • Paper Acceptance Notification: September 1, 2019
  • E-copyright registration completed by authors: October 1, 2019
  • Camera-ready deadline: October 1, 2019
  • Workshop: November 17, 2019

Workshop Description

Data-intensive Workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of parallel and distributed computing. Workflows provide a systematic way of describing the analysis and rely on workflow management systems to execute the complex analyses on a variety of distributed resources. They are at the interface between end-users and computing infrastructures. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, e.g., workflows played an important role in the discovery of Gravitational Waves.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from job execution to service management and the coordination of data, service and job dependencies. The workshop therefore covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques that may optimize the execution of the workflow; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, fault detection and tolerance.


Continue Reading

WORKS 2018 Proceedings

The proceedings of the 13th Workflows in Support of Large-Scale Science Workshop is now available.

The Workshop on Workflows in Support of Large-Scale Science (WORKS) has been established as a premier forum on all aspects in relation to scientific workflows. The 13th edition of the workshop, WORKS 2018, co-located with SC18 in Dallas, Texas, USA, followed the successful tradition of previous years (Paris, 2006; Monterey Bay, 2007; Austin, 2008; Portland, 2009; New Orleans, 2010; Seattle, 2011; Salt Lake City, 2012; Denver, 2013; New Orleans, 2014; Austin, 2015; Salt Lake City, 2016; Denver, 2017). WORKS 2018 focuses on the many facets of data- intensive workflow management systems, ranging from data-intensive workflows representation and enactment to scheduling and resource allocation to provenance to workflow enactment on heterogeneous architectures as well as workflow tools and support environments.

The Call for Papers attracted 19 submissions. After a rigorous review process where each paper received at least two reviews, 8 papers were accepted for a full presentation (acceptance rate 42%). The workshop took place on Sunday, 11 November 2018 and the program also featured an invited keynote by Ilkay Altintas, three lightning talks and a panel discussion. The proceedings include the eight full papers accepted and presented at the workshop. We would like to thank the authors, the presenters, the general chair, the steering committee, the publicity chairs, the program committee, and the SC18 workshop chairs – without their excellent work and contributions WORKS would not be successful.

List of papers published:


Continue Reading