Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows

Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world scientific workflow applications is still not clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running two data-intensive workflows, a high-throughput genome analysis workflow, and a subset of the SCEC high-performance CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of an order of magnitude, and these improvements lead to increased job performance, and thereby increased overall workflow performance, even for long-running CPU-bound jobs.

I/O Read: performance comparison of read operations with burst- buffers (top) and the PFS (bottom) at NERSC.

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, T. M. A. Do, G. Papadimitriou, and E. Deelman, “Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows,” Future Generation Computer Systems, vol. 101, p. 208–220, 2019.
    [Bibtex]
    @article{ferreiradasilva-fgcs-bb-2019,
    title = {Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Do, Tu Mai Anh and Papadimitriou, George and Deelman, Ewa},
    journal = {Future Generation Computer Systems},
    volume = {101},
    number = {},
    pages = {208--220},
    year = {2019},
    doi = {10.1016/j.future.2019.06.016}
    }

223 views

Continue Reading

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows


Presentation held at the 12th Workflows in Support of Large-Scale Science, 2017
Denver, CO, USA – SuperComputing’17

Abstract – Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world workflow applications is not always clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.

 

Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, and E. Deelman, “On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows,” in 12th Workshop on Workflows in Support of Large-Scale Science (WORKS’17), 2017.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2017,
    title = {On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Deelman, Ewa},
    booktitle = {12th Workshop on Workflows in Support of Large-Scale Science (WORKS'17)},
    year = {2017},
    pages = {},
    doi = {10.1145/3150994.3151000}
    }

 

1,468 views

Continue Reading