The WorkflowHub project is a community framework for enabling scientific workflow research and development. It provides foundational tools for analyzing workflow execution traces, and generating synthetic, yet realistic, workflow traces. These traces can then be used for experimental evaluation and development of novel algorithms and systems for overcoming the challenge of efficient and robust execution of ever-demanding workflows on increasingly complex distributed infrastructures.

The WorkflowHub project uses a common format for representing collected workflow traces and generated synthetic workflows traces. Workflow simulators and simulation frameworks that support this common format can then use both types of traces interchangeably.

In order to allow users to analyze existing workflow traces and to generate synthetic workflow traces, the WorkflowHub framework provides a collection of tools released as an open source Python package. This package provides several tools for analyzing workflow traces. More specifically, analyses can be performed to produce statistical summaries of workflow performance characteristics, per task type. The WorkflowHub Python package also provides a number of workflow recipes for generating realistic synthetic workflow traces. The current version of the WorkflowHub’s Python package provides recipes for generating synthetic workflows for all 6 applications.

Read our technical report available at ArXiv.