Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows

Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world scientific workflow applications is still not clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running two data-intensive workflows, a high-throughput genome analysis workflow, and a subset of the SCEC high-performance CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of an order of magnitude, and these improvements lead to increased job performance, and thereby increased overall workflow performance, even for long-running CPU-bound jobs.

I/O Read: performance comparison of read operations with burst- buffers (top) and the PFS (bottom) at NERSC.

Reference to the paper:

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, T. M. A. Do, G. Papadimitriou, and E. Deelman, “Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows,” Future Generation Computer Systems, vol. 101, p. 208–220, 2019.
    [Bibtex]
    @article{ferreiradasilva-fgcs-bb-2019,
    title = {Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Do, Tu Mai Anh and Papadimitriou, George and Deelman, Ewa},
    journal = {Future Generation Computer Systems},
    volume = {101},
    number = {},
    pages = {208--220},
    year = {2019},
    doi = {10.1016/j.future.2019.06.016}
    }

72 views

You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *