Research
My main area of research is developing solutions for the management of scientific workflows in distributed environments.
Data analysis within the scientific collaborations is a large-scale and rigorous process where large amounts of data (in the order of Terabytes) is analyzed. Applications are being built not as monolithic entities designed by a single individual, but rather as complex workflows composed of application components. Often times these components are designed, developed, and tested collaboratively. Because of the size of the data and the complexity of the analysis, large computer clusters are being used to store the data sets and execute the workflows. As the size of the data and of the analysis grows, scientific collaborations are pooling their resources into distributed systems such as the grid. The grid is a distributed system that seamlessly connects resources across the wide area network: compute resources, storage, instruments, etc., and provides the software, such as the Globus Toolkit, to securely submit jobs remotely, to transfer data, to operate apparatus, and perform other remote operations.
Managing the data and analysis in a systematic, robust and collaborative fashion is now one of the foci of many IT projects. The National Science Foundation is currently funding several projects such the Grids Physics Network (GriPhyN), the National Virtual Laboratory (NVO), and the Southern California Earthquake Center/IT (SCEC/IT) to provide software that will aid scientists in discovering data and metadata (descriptive information about data products), in setting up and executing complex analysis, and in storing and sharing information about newly derived results.
My work focuses on designing software solutions that aid scientists in a variety of disciplines to easily execute complex analysis on distributed and heterogeneous resources. In particular I have been working on the development of the Pegasus system that can map abstract workflows onto the grid. Abstract workflows describe the analysis in terms of logical transformations and data without identifying the resources needed to execute the workflow.
Mapping the abstract workflow description to an executable form involves finding the resources that are available and can perform the computations, the data that is used in the workflow, and the necessary software. Pegasus consults various Grid information services to find the above information. Pegasus also reuses existing intermediate data products where possible, thus potentially reducing the workflow. As part of the mapping Pegasus augments the workflow with data transfer nodes to stage data in and out of the computations, data registration nodes that can update various catalog on the grid and recently also nodes that can stage-in statically linked binaries. The result of the mapping is a concrete workflow that can be executed on the grid by software systems such as Condor's DAGMan.
Some of the issues that I am exploring are:
-
Mapping (or planning) horizon: It is not always efficient to map the entire workflow in one shot, especially if the execution environment is very dynamic. I am investigating deferred planning techniques that map only portions of the workflow at a time (relevant paper).
-
Scheduling algorithms: I am investigating various algorithms that can be used to map workflows onto the available resources (paper coming soon)
-
Reliability: I am exploring the use of re-planning techniques to improve the reliability of workflow execution. (slides showing the Pegasus approach)
-
Exploring multiple tradeoffs in the solution space: I am exploring the use of AI planning techniques to search the large solution space of possible mappings of workflow tasks onto resources (several papers on the topic here).
The Pegasus team is composed of Gaurang Mehta, Mei-Hui Su and Karan Vahi.
A PhD student Gurmeet Singh is also working on the project.
This work is supported by the National Science Foundation under the following projects: SDCI-Pegasus, GriPhyN, iVDGL, NVO and SCEC.