Directed Research

Marina del Rey, CA - View from ISI

Are you a USC student and looking for research experience? I am currently looking for good students that want to conduct small research projects in the area of distributed systems, with emphasis in scientific workflows, cloud computing, containers, data and resource management, and computational simulation.

If you have interest in any of the research projects below, please contact me ( rafsilva@…). In addition, please visit the USC’s guidance for Directed Research students for additional information on how to proceed.

We are located outside campus at the Information Sciences Institute (Marina del Rey, CA). USC provides continuous free shuttle services between the campus and ISI. Please, visit the USC’s transportation website for the schedules.

Projects for Spring 2018 term:

Project #1: Building a Workflow Management System Simulation Workbench

Overview —
WRENCH provides a software framework that makes it possible to simulate large-scale hypothetical scenarios quickly and accurately on a single computer, obviating the need for expensive and time-consuming trial and error experiments. WRENCH enables scientists to make quick and informed choices when executing their workflows, software developers to implement more efficient software infrastructures to support workflows, and researchers to develop novel efficient algorithms to be embedded within these software infrastructures.

Summary of Activities — The student will implement scheduling and optimization algorithms using WRENCH as the simulation platform. The goal of this study is to compare different workflow scheduling and optimization approaches from the literature that have been used in the past independently.

Required Skills: C++, Git
Desired Skills: Unix, Google Test

Project #2: Enabling Scientific Workflow executions via Jupyter Notebooks

Overview — The Jupyter Notebook is a web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. Its flexible and portable format resulted in a rapidly adoption by the research community to share and interact with experiments. Jupyter Notebooks has a strong potential to reduce the gap between researchers and the complex knowledge required to run large-scale scientific workflows via a programmatic high-level interface to access/manage workflow capabilities.

Summary of Activities — The student will extend the Pegasus’ Python API for Jupyter to manage workflow executions (submission and monitoring), as well as collecting workflow performance information (e.g., statistics, errors, etc.).

Required Skills: Python, Git
Desired Skills: Jupyter, Unix

Project #3: Building tools to support research in Scientific Workflows


Overview —
A significant amount of recent research in scientific workflows aims   to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. Since the infrastructures, systems and applications are complex, and their behavior is difficult to reproduce using physical  experiments, much of this research is based on simulation. Workflow execution traces and synthetic workflow generators are the main tools used by researchers to evaluate their novel algorithms and mechanisms. Although several trace archives are available for the community, there is a lack of a standard mechanism to represent them in a consistent manner.

Summary of Activities — The student will extend the tools available as part of the WorkflowHub project to parse Pegasus’ output logs, and create JSON trace files that will be made available online via GitHub. The student will also implement a Python version of a Workflow Generator (based on our previous work), that will also be made publicly available in the WorkflowHub project website.

Required Skills: Python, Git
Desired Skills: JSON, Unix

Project #4: Building data-streaming infrastructure for data analytics in real-time

Overview — Despite much work in the area of workflow management that focused on solving important engineering challenges, and despite the production use of these WMSs in many scientific domains, the current state­-of-­the-­art in that area lacks a deep understanding of the requirements, characteristics, and relationships of current and next­-generation applications and systems.

Summary of Activities — The student will use Docker containers to extend the set of containers (using Docker compose) for building a data-streaming infrastructure for data analytics in real-time. The student will work with state-of-the-art technologies for deploying and configure analytics and data-streaming processing within a cloud environment.

Required Skills: Unix, Git
Desired Skills: Docker, Apache Kafka, Elasticsearch, Apache Spark