Figure. Characterization of state-of-the-art WMSs. The classification highlights relevant characteristics to attain extreme-scale.

Automation of the execution of computational tasks is at the heart of improving scientific productivity. Scientific workflows have supported breakthroughs across several domains such as astronomy, physics, climate science, earthquake science, biology, and others. Scientific workflow management systems (WMS) are critical automation components that enable efficient and robust workflow execution across heterogeneous infrastructures.

In this paper, we seek to understand the requirements and characteristics of state-of-the-art WMSs for extreme-scale applications. We evaluate and classify 15 popular workflow systems and the applications they support designed specifically for extreme-scale workflows. We surveyed and classified workflow properties and management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. This paper has identified a number of properties that future WMSs need to support in order to meet extreme-scale requirements, as well as the re-search gaps in the state-of-the-art.

This paper has been published in the Future Generation Computer Systems, available online here.


Abstract – Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies,workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.


Reference to the paper