Pegasus: automate, recover, and debug scientific computations


Automate the scientific computational work as portable workflows. Automatically locates the necessary input data and computational resources, and manages storage space for executing data-intensive workflows on storage-constrained resources.Recover from failures at runtime. Task are automatically retried in the presence of errors. A rescue workflow containing a description of only the work that remains is provided. Provenance is also captured (data, software, parameters, etc.). Debug failures in computations using a set of system provided debugging tools and an online workflow monitoring dashboard.

 

Related Publications

  • [DOI] E. Deelman, K. Vahi, M. Rynge, G. Juve, R. Mayani, and R. Ferreira da Silva, “Pegasus in the Cloud: Science Automation through Workflow Technologies,” IEEE Internet Computing, vol. 20, iss. 1, pp. 70-76, 2016.
    [Bibtex]
    @article{deelman-ic-2016,
    title = {Pegasus in the Cloud: Science Automation through Workflow Technologies},
    author = {Deelman, Ewa and Vahi, Karan and Rynge, Mats and Juve, Gideon and Mayani, Rajiv and Ferreira da Silva, Rafael},
    journal = {{IEEE} Internet Computing},
    volume = {20},
    number = {1},
    pages = {70--76},
    year = {2016},
    doi = {10.1109/MIC.2016.15}
    }
  • [PDF] [DOI] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus, a Workflow Management System for Science Automation,” Future Generation Computer Systems, vol. 46, pp. 17-35, 2015.
    [Bibtex]
    @article{deelman-fgcs-2015,
    title = {Pegasus, a Workflow Management System for Science Automation},
    journal = {Future Generation Computer Systems},
    volume = {46},
    number = {0},
    pages = {17--35},
    year = {2015},
    doi = {10.1016/j.future.2014.10.008},
    author = {Deelman, Ewa and Vahi, Karan and Juve, Gideon and Rynge, Mats and Callaghan, Scott and Maechling, Phil J. and Mayani, Rajiv and Chen, Weiwei and Ferreira da Silva, Rafael and Livny, Miron and Wenger, Kent},
    }

600 views

Continue Reading

Task Resource Consumption Prediction for Scientific Applications and Workflows


Presentation held at the Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems 2015
Dagstuhl, Germany

Abstract – Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable scientific application executions. Such algorithms often assume that accurate estimates are available, but such estimates are difficult to generate in practice. In this work, we first profile real scientific applications and workflows, collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize task requirements based on these profiles. Our method estimates task runtime, disk space, and peak memory consumption. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict task characteristics of scientific applications based on the collected data. For scientific workflows, we propose an online estimation process based on the MAPE-K loop, where task executions are monitored and estimates are updated as more information becomes available. Experimental results show that our online estimation process results in much more accurate predictions than an offline approach, where all task requirements are estimated prior to workflow execution.

 

Related Publications

  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, M. Rynge, E. Deelman, and M. Livny, “Online Task Resource Consumption Prediction for Scientific Workflows,” Parallel Processing Letters, vol. 25, iss. 3, 2015.
    [Bibtex]
    @article{ferreiradasilva-ppl-2015,
    title = {Online Task Resource Consumption Prediction for Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Rynge, Mats and Deelman, Ewa and Livny, Miron},
    journal = {Parallel Processing Letters},
    volume = {25},
    number = {3},
    pages = {},
    year = {2015},
    doi = {10.1142/S0129626415410030}
    }
  • [PDF] [DOI] R. Ferreira da Silva, M. Rynge, G. Juve, I. Sfiligoi, E. Deelman, J. Letts, F. Würthwein, and M. Livny, “Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC,” Procedia Computer Science, vol. 51, pp. 39-48, 2015.
    [Bibtex]
    @article{ferreiradasilva-iccs-2015,
    title = {Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid ({CMS}) Experiment at {LHC}},
    author = {Ferreira da Silva, Rafael and Rynge, Mats and Juve, Gideon and Sfiligoi, Igor and Deelman, Ewa and Letts, James and W\"urthwein, Frank and Livny, Miron},
    journal = {Procedia Computer Science},
    year = {2015},
    volume = {51},
    pages = {39--48},
    note = {International Conference On Computational Science, \{ICCS\} 2015 Computational Science at the Gates of Nature},
    doi = {10.1016/j.procs.2015.05.190}
    }
  • [PDF] [DOI] R. Ferreira da Silva, G. Juve, E. Deelman, T. Glatard, F. Desprez, D. Thain, B. Tovar, and M. Livny, “Toward fine-grained online task characteristics estimation in scientific workflows,” in 8th Workshop on Workflows in Support of Large-Scale Science, 2013, pp. 58-67.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2013,
    author = {Ferreira da Silva, Rafael and Juve, Gideon and Deelman, Ewa and Glatard, Tristan and Desprez, Fr{\'e}d{\'e}ric and Thain, Douglas and Tovar, Benjamin and Livny, Miron},
    title = {Toward fine-grained online task characteristics estimation in scientific workflows},
    booktitle = {8th Workshop on Workflows in Support of Large-Scale Science},
    series = {WORKS '13},
    year = {2013},
    pages = {58--67},
    doi = {10.1145/2534248.2534254},
    }

 

214 views

Continue Reading

Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC


Presentation held at ICCS 2015 Conference, 2015
Reykjavik, Iceland

High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.

 

Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, M. Rynge, G. Juve, I. Sfiligoi, E. Deelman, J. Letts, F. Würthwein, and M. Livny, “Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC,” Procedia Computer Science, vol. 51, pp. 39-48, 2015.
    [Bibtex]
    @article{ferreiradasilva-iccs-2015,
    title = {Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid ({CMS}) Experiment at {LHC}},
    author = {Ferreira da Silva, Rafael and Rynge, Mats and Juve, Gideon and Sfiligoi, Igor and Deelman, Ewa and Letts, James and W\"urthwein, Frank and Livny, Miron},
    journal = {Procedia Computer Science},
    year = {2015},
    volume = {51},
    pages = {39--48},
    note = {International Conference On Computational Science, \{ICCS\} 2015 Computational Science at the Gates of Nature},
    doi = {10.1016/j.procs.2015.05.190}
    }

 

120 views

Continue Reading