WRENCH: Workflow Management System Simulation Workbench


Abstract – WRENCH enables novel avenues for scientific workflow use, research, development, and education. WRENCH capitalizes on recent and critical advances in the state of the art of distributed platform/application simulation. WRENCH builds on top of the open-source SimGrid simulation framework. SimGrid enables the simulation of large-scale distributed applications in a way that is accurate (via validated simulation models), scalable (low ratio of simulation time to simulated time, ability to run large simulations on a single computer with low compute, memory, and energy footprints), and expressive (ability to simulate arbitrary platform, application, and execution scenarios). WRENCH provides directly usable high-level simulation abstractions using SimGrid as a foundation. More information on https://wrench-project.org.

In a nutshell, WRENCH makes it possible to:

  • Prototype implementations of Workflow Management System (WMS) components and underlying algorithms;
  • Quickly, scalably, and accurately simulate arbitrary workflow and platform scenarios for a simulated WMS implementation; and
  • Run extensive experimental campaigns to conclusively compare workflow executions, platform architectures, and WMS algorithms and designs.

 

108 views

Continue Reading

13th Workflows in Support of Large-Scale Science – WORKS @SC18


Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

Data-intensive workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of high-performance, parallel and distributed computing. They provide a systematic way of describing a complex scientific process and rely on sophisticated workflow management systems to execute on a variety of parallel and distributed resources. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, being at the interface between end-users and computing infrastructures.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques to optimize the execution of the workflow for different infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.


Important Dates
Papers due: July 30 August 13, 2018
Paper Acceptance Notification: September 9 September 21, 2018

 

 

177 views

 

Continue Reading

The Interplay of Workflow Execution and Resource Provisioning


Presentation held at the 18th SIAM Conference on Parallel Processing for Scientific Computing, 2018
Resource Management, Scheduling, Workflows: Critical Middleware for HPC and Clouds
Tokyo, Japan

Abstract – This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.

 

388 views

Continue Reading

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows


Presentation held at the 12th Workflows in Support of Large-Scale Science, 2017
Denver, CO, USA – SuperComputing’17

Abstract – Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world workflow applications is not always clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.

 

Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, and E. Deelman, “On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows,” in 12th Workshop on Workflows in Support of Large-Scale Science (WORKS’17), 2017.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2017,
    title = {On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Deelman, Ewa},
    booktitle = {12th Workshop on Workflows in Support of Large-Scale Science (WORKS'17)},
    year = {2017},
    pages = {},
    doi = {10.1145/3150994.3151000}
    }

 

658 views

Continue Reading

A Characterization of Workflow Management Systems for Extreme-Scale Applications

 

Automation of the execution of computational tasks is at the heart of improving scientific productivity. Scientific workflows have supported breakthroughs across several domains such as astronomy, physics, climate science, earthquake science, biology, and others. Scientific workflow management systems (WMS) are critical automation components that enable efficient and robust workflow execution across heterogeneous infrastructures.

In this paper, we seek to understand the requirements and characteristics of state-of-the-art WMSs for extreme-scale applications. We evaluate and classify 15 popular workflow systems and the applications they support designed specifically for extreme-scale workflows. We surveyed and classified workflow properties and management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. This paper has identified a number of properties that future WMSs need to support in order to meet extreme-scale requirements, as well as the re-search gaps in the state-of-the-art.

This paper has been published in the Future Generation Computer Systems, available online here.

 

Characterization of state-of-the-art WMSs. The classification highlights relevant characteristics to attain extreme-scale.

Abstract – Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies,workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.

 

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and E. Deelman, “A Characterization of Workflow Management Systems for Extreme-Scale Applications,” Future Generation Computer Systems, vol. 75, p. 228–238, 2017.
    [Bibtex]
    @article{ferreiradasilva-fgcs-2017,
    title = {A Characterization of Workflow Management Systems for Extreme-Scale Applications},
    author = {Ferreira da Silva, Rafael and Filgueira, Rosa and Pietri, Ilia and Jiang, Ming and Sakellariou, Rizos and Deelman, Ewa},
    journal = {Future Generation Computer Systems},
    volume = {75},
    number = {},
    pages = {228--238},
    year = {2017},
    doi = {10.1016/j.future.2017.02.026}
    }

 

List of publications

 

1,026 views

Continue Reading

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows


Presentation held at the 11th Workflows in Support of Large-Scale Science, 2016
Salt Lake City, UT, USA – SuperComputing’16

Abstract – Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. In spite of many success stories, a key challenge for running workflows in distributed systems is failure prediction, detection, and recovery. In this paper, we propose an approach to use control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach applying the proportional-integral-derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, to mitigate faults by adjusting the inputs of the controller. The PID controller aims at detecting the possibility of a fault far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of the Big Data era—data storage overload and memory overflow. We define, implement, and evaluate simple PID controllers to autonomously manage data and memory usage of a bioinformatics workflow that consumes/produces over 4.4TB of data, and requires over 24TB of memory to run all tasks concurrently. Experimental results indicate that workflow executions may significantly benefit from PID controllers, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence.

 

Related Publication

  • [PDF] R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, and M. Atkinson, “Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows,” in 11th Workflows in Support of Large-Scale Science, 2016, p. 15–24.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2016,
    author = {Ferreira da Silva, Rafael and Filgueira, Rosa and Deelman, Ewa and Pairo-Castineira, Erola and Overton, Ian Michael and Atkinson, Malcolm},
    title = {Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows},
    year = {2016},
    booktitle = {11th Workflows in Support of Large-Scale Science},
    series = {WORKS'16},
    pages = {15--24}
    }

 

1,255 views

Continue Reading

Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows


Presentation held at the Workshop of Environmental Computing Applications, 2016
Baltimore, MD, USA – IEEE 12th International Conference on eScience

Abstract – In order to support the computational and data needs of today’s science, new knowledge must be gained on how to deliver the growing capabilities of the national cyberinfrastructures and more recently commercial clouds to the scientist’s desktop in an accessible, reliable, and scalable way. In over a decade of working with domain scientists, the Pegasus workflow management system has being used by researchers to model seismic wave propagation, to discover new celestial objects, to study RNA critical to human brain development, and to investigate other important research questions. Recently, the Pegasus and the dispel4py teams have collaborated to enable automated processing of real-time seismic interferometry and earthquake “repeater” analysis using data collected from the IRIS database. The proposed integrated solution empowers real-time stream-based workflows to seamlessly run on different distributed infrastructures (or in the wide area), where data is automatically managed by a task-oriented workflow system, which orchestrates the distributed execution. We have demonstrated the feasibility of this approach by using docker containers to deploy the workflow management systems and two different computing infrastructures: an Apache Storm cluster for real-time processing, and an MPI-based cluster for shared memory computing. Stream-based executions is managed by dispel4py, while the data movement between the clusters and the workflow engine (submit host) is managed by Pegasus.

 

Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, E. Deelman, R. Filgueira, K. Vahi, M. Rynge, R. Mayani, and B. Mayer, “Automating Environmental Computing Applications with Scientific Workflows,” in Environmental Computing Workshop, IEEE 12th International Conference on e-Science, 2016, p. 400–406.
    [Bibtex]
    @inproceedings{ferreiradasilva-ecw-2016,
    author = {Ferreira da Silva, Rafael and Deelman, Ewa and Filgueira, Rosa and Vahi, Karan and Rynge, Mats and Mayani, Rajiv and Mayer, Benjamin},
    title = {Automating Environmental Computing Applications with Scientific Workflows},
    year = {2016},
    booktitle = {Environmental Computing Workshop, IEEE 12th International Conference on e-Science},
    series = {ECW'16},
    doi = {10.1109/eScience.2016.7870926},
    pages = {400--406}
    }

 

1,092 views

Continue Reading

Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services


Presentation held at the 18th Workshop on Advances in Parallel and Distributed Computational Models, 2016
Chicago, IL, USA – 30th IEEE International Parallel and Distributed Processing Symposium

Abstract – Scientific workflows have become the mainstream to conduct large-scale scientific research. In the meantime, cloud computing has emerged as an alternative computing paradigm. In this paper, we conduct an analysis of the performance of an I/O-intensive real scientific workflow on cloud environments using makespan (the turnaround time for a workflow to complete its execution) as the key performance metric. In particular, we assess the impact of varying the storage configurations on workflow performance when executing on Google Cloud and Amazon Web Services. We aim to understand the performance bottlenecks of the popular cloud-based execution environments. Experimental results show significant differences in application performance for different configurations. They also reveal that Amazon Web Services outperforms Google Cloud with equivalent application and system configurations. We then investigate the root cause of these results using provenance data and by benchmarking disk and network I/O on both infrastructures. Lastly, we also suggest modifications in the standard cloud storage APIs, which will reduce the makespan for I/O-intensive workflows.

 

Related Publication

  • [PDF] [DOI] H. Nawaz, G. Juve, R. Ferreira da Silva, and E. Deelman, “Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services,” in 18th Workshop on Advances in Parallel and Distributed Computational Models, 2016, p. 535–544.
    [Bibtex]
    @inproceedings{nawaz-apdcm-2016,
    author = {Nawaz, Hassan and Juve, Gideon and Ferreira da Silva, Rafael and Deelman, Ewa},
    title = {Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services},
    booktitle = {18th Workshop on Advances in Parallel and Distributed Computational Models},
    series = {APDCM'16},
    year = {2016},
    doi = {10.1109/IPDPSW.2016.90},
    pages = {535--544}
    }

 

1,094 views

Continue Reading

Pegasus: automate, recover, and debug scientific computations


Automate the scientific computational work as portable workflows. Automatically locates the necessary input data and computational resources, and manages storage space for executing data-intensive workflows on storage-constrained resources.Recover from failures at runtime. Task are automatically retried in the presence of errors. A rescue workflow containing a description of only the work that remains is provided. Provenance is also captured (data, software, parameters, etc.). Debug failures in computations using a set of system provided debugging tools and an online workflow monitoring dashboard.

 

Related Publications

  • [DOI] E. Deelman, K. Vahi, M. Rynge, G. Juve, R. Mayani, and R. Ferreira da Silva, “Pegasus in the Cloud: Science Automation through Workflow Technologies,” IEEE Internet Computing, vol. 20, iss. 1, p. 70–76, 2016.
    [Bibtex]
    @article{deelman-ic-2016,
    title = {Pegasus in the Cloud: Science Automation through Workflow Technologies},
    author = {Deelman, Ewa and Vahi, Karan and Rynge, Mats and Juve, Gideon and Mayani, Rajiv and Ferreira da Silva, Rafael},
    journal = {{IEEE} Internet Computing},
    volume = {20},
    number = {1},
    pages = {70--76},
    year = {2016},
    doi = {10.1109/MIC.2016.15}
    }
  • [PDF] [DOI] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus, a Workflow Management System for Science Automation,” Future Generation Computer Systems, vol. 46, p. 17–35, 2015.
    [Bibtex]
    @article{deelman-fgcs-2015,
    title = {Pegasus, a Workflow Management System for Science Automation},
    journal = {Future Generation Computer Systems},
    volume = {46},
    number = {0},
    pages = {17--35},
    year = {2015},
    doi = {10.1016/j.future.2014.10.008},
    author = {Deelman, Ewa and Vahi, Karan and Juve, Gideon and Rynge, Mats and Callaghan, Scott and Maechling, Phil J. and Mayani, Rajiv and Chen, Weiwei and Ferreira da Silva, Rafael and Livny, Miron and Wenger, Kent},
    }

1,937 views

Continue Reading

Task Resource Consumption Prediction for Scientific Applications and Workflows


Presentation held at the Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems 2015
Dagstuhl, Germany

Abstract – Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable scientific application executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile real scientific applications and workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict task characteristics of scientific applications based on the collected data. For scientific workflows, we propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.

 

Related Publications

  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, M. Rynge, E. Deelman, and M. Livny, “Online Task Resource Consumption Prediction for Scientific Workflows,” Parallel Processing Letters, vol. 25, iss. 3, 2015.
    [Bibtex]
    @article{ferreiradasilva-ppl-2015,
    title = {Online Task Resource Consumption Prediction for Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Rynge, Mats and Deelman, Ewa and Livny, Miron},
    journal = {Parallel Processing Letters},
    volume = {25},
    number = {3},
    pages = {},
    year = {2015},
    doi = {10.1142/S0129626415410030}
    }
  • [PDF] [DOI] R. Ferreira da Silva, M. Rynge, G. Juve, I. Sfiligoi, E. Deelman, J. Letts, F. Würthwein, and M. Livny, “Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC,” Procedia Computer Science, vol. 51, p. 39–48, 2015.
    [Bibtex]
    @article{ferreiradasilva-iccs-2015,
    title = {Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid ({CMS}) Experiment at {LHC}},
    author = {Ferreira da Silva, Rafael and Rynge, Mats and Juve, Gideon and Sfiligoi, Igor and Deelman, Ewa and Letts, James and W\"urthwein, Frank and Livny, Miron},
    journal = {Procedia Computer Science},
    year = {2015},
    volume = {51},
    pages = {39--48},
    note = {International Conference On Computational Science, \{ICCS\} 2015 Computational Science at the Gates of Nature},
    doi = {10.1016/j.procs.2015.05.190}
    }
  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, E. Deelman, T. Glatard, F. Desprez, D. Thain, B. Tovar, and M. Livny, “Toward fine-grained online task characteristics estimation in scientific workflows,” in 8th Workshop on Workflows in Support of Large-Scale Science, 2013, p. 58–67.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2013,
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Deelman, Ewa and Glatard, Tristan and Desprez, Fr{\'e}d{\'e}ric and Thain, Douglas and Tovar, Benjamin and Livny, Miron},
    title = {Toward fine-grained online task characteristics estimation in scientific workflows},
    booktitle = {8th Workshop on Workflows in Support of Large-Scale Science},
    series = {WORKS '13},
    year = {2013},
    pages = {58--67},
    doi = {10.1145/2534248.2534254},
    }

 

956 views

Continue Reading