13th Workflows in Support of Large-Scale Science – WORKS @SC18

Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

Data-intensive workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of high-performance, parallel and distributed computing. They provide a systematic way of describing a complex scientific process and rely on sophisticated workflow management systems to execute on a variety of parallel and distributed resources. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, being at the interface between end-users and computing infrastructures.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques to optimize the execution of the workflow for different infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.

Important Dates
Papers due: July 30 August 13, 2018
Paper Acceptance Notification: September 9 September 25, 2018





Continue Reading

The Interplay of Workflow Execution and Resource Provisioning

Presentation held at the 18th SIAM Conference on Parallel Processing for Scientific Computing, 2018
Resource Management, Scheduling, Workflows: Critical Middleware for HPC and Clouds
Tokyo, Japan

Abstract – This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.



Continue Reading

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Presentation held at the 11th Workflows in Support of Large-Scale Science, 2016
Salt Lake City, UT, USA – SuperComputing’16

Abstract – Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. In spite of many success stories, a key challenge for running workflows in distributed systems is failure prediction, detection, and recovery. In this paper, we propose an approach to use control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach applying the proportional-integral-derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, to mitigate faults by adjusting the inputs of the controller. The PID controller aims at detecting the possibility of a fault far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of the Big Data era—data storage overload and memory overflow. We define, implement, and evaluate simple PID controllers to autonomously manage data and memory usage of a bioinformatics workflow that consumes/produces over 4.4TB of data, and requires over 24TB of memory to run all tasks concurrently. Experimental results indicate that workflow executions may significantly benefit from PID controllers, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence.


Related Publication

  • [PDF] R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, and M. Atkinson, “Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows,” in 11th Workflows in Support of Large-Scale Science, 2016, p. 15–24.
    author = {Ferreira da Silva, Rafael and Filgueira, Rosa and Deelman, Ewa and Pairo-Castineira, Erola and Overton, Ian Michael and Atkinson, Malcolm},
    title = {Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows},
    year = {2016},
    booktitle = {11th Workflows in Support of Large-Scale Science},
    series = {WORKS'16},
    pages = {15--24}



Continue Reading

Task Resource Consumption Prediction for Scientific Applications and Workflows

Presentation held at the Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems 2015
Dagstuhl, Germany

Abstract – Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable scientific application executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile real scientific applications and workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict task characteristics of scientific applications based on the collected data. For scientific workflows, we propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.


Related Publications

  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, M. Rynge, E. Deelman, and M. Livny, “Online Task Resource Consumption Prediction for Scientific Workflows,” Parallel Processing Letters, vol. 25, iss. 3, 2015.
    title = {Online Task Resource Consumption Prediction for Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Rynge, Mats and Deelman, Ewa and Livny, Miron},
    journal = {Parallel Processing Letters},
    volume = {25},
    number = {3},
    pages = {},
    year = {2015},
    doi = {10.1142/S0129626415410030}
  • [PDF] [DOI] R. Ferreira da Silva, M. Rynge, G. Juve, I. Sfiligoi, E. Deelman, J. Letts, F. Würthwein, and M. Livny, “Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC,” Procedia Computer Science, vol. 51, p. 39–48, 2015.
    title = {Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid ({CMS}) Experiment at {LHC}},
    author = {Ferreira da Silva, Rafael and Rynge, Mats and Juve, Gideon and Sfiligoi, Igor and Deelman, Ewa and Letts, James and W\"urthwein, Frank and Livny, Miron},
    journal = {Procedia Computer Science},
    year = {2015},
    volume = {51},
    pages = {39--48},
    note = {International Conference On Computational Science, \{ICCS\} 2015 Computational Science at the Gates of Nature},
    doi = {10.1016/j.procs.2015.05.190}
  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, E. Deelman, T. Glatard, F. Desprez, D. Thain, B. Tovar, and M. Livny, “Toward fine-grained online task characteristics estimation in scientific workflows,” in 8th Workshop on Workflows in Support of Large-Scale Science, 2013, p. 58–67.
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Deelman, Ewa and Glatard, Tristan and Desprez, Fr{\'e}d{\'e}ric and Thain, Douglas and Tovar, Benjamin and Livny, Miron},
    title = {Toward fine-grained online task characteristics estimation in scientific workflows},
    booktitle = {8th Workshop on Workflows in Support of Large-Scale Science},
    series = {WORKS '13},
    year = {2013},
    pages = {58--67},
    doi = {10.1145/2534248.2534254},



Continue Reading