13th Workflows in Support of Large-Scale Science – WORKS @SC18

Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

Data-intensive workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of high-performance, parallel and distributed computing. They provide a systematic way of describing a complex scientific process and rely on sophisticated workflow management systems to execute on a variety of parallel and distributed resources. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, being at the interface between end-users and computing infrastructures.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques to optimize the execution of the workflow for different infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

Important Dates
Papers due: July 30, 2018
Paper Acceptance Notification: September 9, 2018
Continue Reading

The Interplay of Workflow Execution and Resource Provisioning

Presentation held at the 18th SIAM Conference on Parallel Processing for Scientific Computing, 2018
Resource Management, Scheduling, Workflows: Critical Middleware for HPC and Clouds
Tokyo, Japan

Abstract – This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.



Continue Reading

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows

Presentation held at the 12th Workflows in Support of Large-Scale Science, 2017
Denver, CO, USA – SuperComputing’17

Abstract – Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world workflow applications is not always clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.


Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, and E. Deelman, “On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows,” in 12th Workshop on Workflows in Support of Large-Scale Science (WORKS’17), 2017.
    title = {On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Deelman, Ewa},
    booktitle = {12th Workshop on Workflows in Support of Large-Scale Science (WORKS'17)},
    year = {2017},
    pages = {},
    doi = {10.1145/3150994.3151000}



Continue Reading

A Characterization of Workflow Management Systems for Extreme-Scale Applications


Automation of the execution of computational tasks is at the heart of improving scientific productivity. Scientific workflows have supported breakthroughs across several domains such as astronomy, physics, climate science, earthquake science, biology, and others. Scientific workflow management systems (WMS) are critical automation components that enable efficient and robust workflow execution across heterogeneous infrastructures.

In this paper, we seek to understand the requirements and characteristics of state-of-the-art WMSs for extreme-scale applications. We evaluate and classify 15 popular workflow systems and the applications they support designed specifically for extreme-scale workflows. We surveyed and classified workflow properties and management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. This paper has identified a number of properties that future WMSs need to support in order to meet extreme-scale requirements, as well as the re-search gaps in the state-of-the-art.

This paper has been published in the Future Generation Computer Systems, available online here.


Characterization of state-of-the-art WMSs. The classification highlights relevant characteristics to attain extreme-scale.

Abstract – Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies,workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.


Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and E. Deelman, “A Characterization of Workflow Management Systems for Extreme-Scale Applications,” Future Generation Computer Systems, vol. 75, p. 228–238, 2017.
    title = {A Characterization of Workflow Management Systems for Extreme-Scale Applications},
    author = {Ferreira da Silva, Rafael and Filgueira, Rosa and Pietri, Ilia and Jiang, Ming and Sakellariou, Rizos and Deelman, Ewa},
    journal = {Future Generation Computer Systems},
    volume = {75},
    number = {},
    pages = {228--238},
    year = {2017},
    doi = {10.1016/j.future.2017.02.026}


List of publications



Continue Reading