Bridging from Concepts to Data and Computation for eScience — BC2DC’19 @eScience19

The Bridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop will be held in conjunction with eScience’19 on Tuesday September 24, 2019 in San Diego, CA.

Important Dates

  • Papers due: July 10, 2019 July 15, 2019
  • Paper Acceptance Notification: July 24, 2019
  • Camera-ready deadline: July 29, 2019
  • Workshop: September 24, 2019

Workshop Description

How can we enable e-Science developers to conceptualize research and translate it to system requirements?
How should we make such processes understandable, reliable, stable and sustainable? 
How should advances in engineering deliver the expanding power of distributed computation, heterogeneous (cloud and data) platforms and the massive – still rapidly growing – wealth of data?
How can we make it easier for organizations and researchers to engage in multiple research collaborations and to adapt rapidly to changing requirements and new opportunities?

Research addressing global challenges federates a growing diversity of disciplines, requires sustained contributions from many autonomous organizations and builds on heterogeneous evolving computational platforms. Scientific knowledge is scattered across cloud-based services, local storage, and in source code targeting specific architectures and computational contexts. Concepts reflected in disparate sources are hardly computer-communicable and computer-actionable across or even within disciplines. This makes traceability, communication of methods, provenance gathering and reusing data and methods harder and more time-consuming. Agile response to new needs and opportunities may be accelerated when the available methods and required components have mutually comprehensible descriptions. Commercial clouds play an increasingly important role in large-scale scientific experimentation. Examples of diversity in technology and jurisdiction, as well as in the large-scale exploitation of clouds can be found on both sides of the Atlantic: in the European Open Science Cloud (EOSC) as well as in the ongoing massive migration of data and other resources onto Amazon’s AWS by NASA.

It follows that while potential for large-scale data-driven experimentation increases, so does complexity as well as the risk of getting locked into vendor-specific solutions. To deal with these challenges and to help researchers make better and transparent use of diverse infrastructures many systems propose higher-level abstraction to hide and orchestrate infrastructural and implementation details. Domain experts need to directly control sophisticated and dynamic concepts pertaining to data, execution contexts and diverse e-infrastructures. Furthermore, they need mechanisms that allow them to take responsibility for the quality of results, without distracting technological artefacts.

These often take the form of service-based platforms, containerised solutions, APIs, ontological descriptions of underlying resources, provenance repositories, etc. This workshop focuses on platform-driven and domain-specific developments that contribute towards unifying underlying platforms, clouds, data, computational resources and concepts in order to empower research developers to deliver, maintain and communicate larger, increasingly complex eScience systems.

In particular we welcome contributions in the following areas, not excluding other topics of interest:

  • Semantic concept description and implementation 
  • Specification and execution of conceptually formulated methods
  • Component descriptions facilitating reliable composition
  • Architectures, frameworks and design patterns delivering flexible use and incremental composition
  • Cloud, fog, edge and specialized platforms
  • Pervasive and persistent provenance 
  • Platforms of platforms, containers, orchestration and microservices
  • HPC computing over Cloud

356 views

Continue Reading

14th Workflows in Support of Large-Scale Science – WORKS 2019 @SC19

The 14th Workflows in Support of Large-Scale Science (WORKS) Workshop will be held in conjunction with SC19 on Sunday November 17, 2019 in Denver, CO.

Important Dates

  • Papers due: July 15, 2019 August 26, 2019 (final extension)
  • Paper Acceptance Notification: September 1, 2019
  • E-copyright registration completed by authors: October 1, 2019
  • Camera-ready deadline: October 1, 2019
  • Workshop: November 17, 2019

Workshop Description

Data-intensive Workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of parallel and distributed computing. Workflows provide a systematic way of describing the analysis and rely on workflow management systems to execute the complex analyses on a variety of distributed resources. They are at the interface between end-users and computing infrastructures. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, e.g., workflows played an important role in the discovery of Gravitational Waves.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from job execution to service management and the coordination of data, service and job dependencies. The workshop therefore covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques that may optimize the execution of the workflow; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, fault detection and tolerance.


775 views

Continue Reading

13th Workflows in Support of Large-Scale Science – WORKS @SC18


Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

Data-intensive workflows (a.k.a. scientific workflows) are routinely used in most scientific disciplines today, especially in the context of high-performance, parallel and distributed computing. They provide a systematic way of describing a complex scientific process and rely on sophisticated workflow management systems to execute on a variety of parallel and distributed resources. With the dramatic increase of raw data volume in every domain, they play an even more critical role to assist scientists in organizing and processing their data and to leverage HPC or HTC resources, being at the interface between end-users and computing infrastructures.

This workshop focuses on the many facets of data-intensive workflow management systems, ranging from actual execution to service management and the coordination and optimization of data, service and job dependencies. The workshop covers a broad range of issues in the scientific workflow lifecycle that include: data-intensive workflows representation and enactment; designing workflow composition interfaces; workflow mapping techniques to optimize the execution of the workflow for different infrastructures; workflow enactment engines that need to deal with failures in the application and execution environment; and a number of computer science problems related to scientific workflows such as semantic technologies, compiler methods, scheduling and fault detection and tolerance.


Important Dates
Papers due: July 30 August 13, 2018
Paper Acceptance Notification: September 9 September 25, 2018

 

 

1,104 views

 

Continue Reading

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows


Presentation held at the 12th Workflows in Support of Large-Scale Science, 2017
Denver, CO, USA – SuperComputing’17

Abstract – Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world workflow applications is not always clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.

 

Related Publication

  • [PDF] [DOI] R. Ferreira da Silva, S. Callaghan, and E. Deelman, “On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows,” in 12th Workshop on Workflows in Support of Large-Scale Science (WORKS’17), 2017.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2017,
    title = {On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows},
    author = {Ferreira da Silva, Rafael and Callaghan, Scott and Deelman, Ewa},
    booktitle = {12th Workshop on Workflows in Support of Large-Scale Science (WORKS'17)},
    year = {2017},
    pages = {},
    doi = {10.1145/3150994.3151000}
    }

 

1,468 views

Continue Reading

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows


Presentation held at the 11th Workflows in Support of Large-Scale Science, 2016
Salt Lake City, UT, USA – SuperComputing’16

Abstract – Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. In spite of many success stories, a key challenge for running workflows in distributed systems is failure prediction, detection, and recovery. In this paper, we propose an approach to use control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach applying the proportional-integral-derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, to mitigate faults by adjusting the inputs of the controller. The PID controller aims at detecting the possibility of a fault far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of the Big Data era—data storage overload and memory overflow. We define, implement, and evaluate simple PID controllers to autonomously manage data and memory usage of a bioinformatics workflow that consumes/produces over 4.4TB of data, and requires over 24TB of memory to run all tasks concurrently. Experimental results indicate that workflow executions may significantly benefit from PID controllers, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence.

 

Related Publication

  • [PDF] R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, and M. Atkinson, “Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows,” in 11th Workflows in Support of Large-Scale Science, 2016, p. 15–24.
    [Bibtex]
    @inproceedings{ferreiradasilva-works-2016,
    author = {Ferreira da Silva, Rafael and Filgueira, Rosa and Deelman, Ewa and Pairo-Castineira, Erola and Overton, Ian Michael and Atkinson, Malcolm},
    title = {Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows},
    year = {2016},
    booktitle = {11th Workflows in Support of Large-Scale Science},
    series = {WORKS'16},
    pages = {15--24}
    }

 

1,891 views

Continue Reading