Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world scientific workflow applications is still not clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running two data-intensive workflows, a high-throughput genome analysis workflow, and a subset of the SCEC high-performance CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of an order of magnitude, and these improvements lead to increased job performance, and thereby increased overall workflow performance, even for long-running CPU-bound jobs.
Reference to the paper
Ferreira da Silva, R., Callaghan, S., Do, T. M. A., Papadimitriou, G., & Deelman, E. (2019). Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows. Future Generation Computer Systems, 101, 208–220. https://doi.org/10.1016/j.future.2019.06.016