Running Accurate, Scalable, and Reproducible Simulations of Distributed Systems with WRENCH

Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empirical experiments conducted with full-fledged software stacks on actual hardware platforms. Such experiments, however, are limited to hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result, relying solely on real-world experiments impedes WMS research and development. An alternative is to conduct experiments in simulation.

In this work, we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high-level and directly reusable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present a case study in which we apply WRENCH to simulate the Pegasus production WMS. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its two above objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator.

Empirical cumulative distribution function of task submit times (left) and task completion times (right) for sample real-world (“pegasus”) and simulated (“wrench” and “workflowsim”) executions of Montage-2.0 on AWS-m5.xlarge.

 

Reference to the paper:

  • [PDF] [DOI] H. Casanova, S. Pandey, J. Oeth, R. Tanaka, F. Suter, and R. Ferreira da Silva, “WRENCH: A Framework for Simulating Workflow Management Systems,” in 13th Workshop on Workflows in Support of Large-Scale Science (WORKS’18), 2018, p. 74–85.
    [Bibtex]
    @inproceedings{casanova-works-2018,
    title = {WRENCH: A Framework for Simulating Workflow Management Systems},
    author = {Casanova, Henri and Pandey, Suraj and Oeth, James and Tanaka, Ryan and Suter, Frederic and Ferreira da Silva, Rafael},
    booktitle = {13th Workshop on Workflows in Support of Large-Scale Science (WORKS'18)},
    year = {2018},
    pages = {74--85},
    doi = {10.1109/WORKS.2018.00013}
    }

 


 

70 views

Continue Reading