SCI/NMI/SGER: Towards Cognitive Grids: Knowledge-Rich Grid Services for Autonomous Workflow Refinement and Robust Execution

 

Ewa Deelman, USC Information Sciences Institute (PI)

Yolanda Gil, USC Information Sciences Institute (co-PI)

www.isi.edu/cognitive-grids

 

 

Goal: The goal of this work was to explore the use of semantic technologies within the grid environment.  In particular this work focused on integrating semantic technologies into scientific workflow management systems such as Pegasus  and Wings.

 

Approach: Providing Semantic Information on Top of Existing Grid Services

           Distributed scientific analyses are often represented as workflows that specify sequences of analysis and simulation operations that need to be performed to achieve a specific scientific goal. The design of scientific workflows is complex, since they may consist of thousands of steps with diverse execution requirements such as access to data, computation resources and instruments. These workflows are executed in a distributed heterogeneous environment that is highly dynamic, opportunistic, and failure-prone where resources are shared and access policies and resource availability can vary over time.  Distributed computing environments such as NSF’s TeraGrid provide information and execution services that support the execution of these workflows.  However, these environments lack the ability to plan and manage autonomously the execution of large scientific workflows.

 

This research combines Artificial Intelligence and distributed computing techniques to create knowledge-rich services that can support the discovery of data and workflow provenance information necessary to find and evaluate the quality of raw and derived data products. The main foundation is be provided by expressive formal representations of metadata, provenance, and the workflow composition processes. These representations will enable:

·        data discovery across multiple metadata catalogs using community-based metadata attributes

·        the development of heuristics that support the propagation of metadata and provenance information during workflow construction

 

This research was conducted in the context of two NSF funded applications: the National Virtual Observatory (NVO) and the Southern California Earthquake Center (SCEC). The ultimate aim of this research is to improve the productivity of scientists and enable them to discover data based on semantic attributes and to examine the provenance of workflow-generated data in support of result validation, verification, and replication.

 

As part of this work we are exploring three aspects of the problem:

 

Relevance of Agent-based and Grid-based technologies and their synergies.

More information about this work can be found in the following paper: On Agents and Grids: Creating the Fabric of a New Generation of Distributed Intelligent Systems, Yolanda Gil. Journal of Web Semantics, Volume 4, Issue 2, June 2006.

 

Semantic-based data discovery in support of the discovery of input data needed by a scientific workflow. More information can be found in Metadata Catalogs with Semantic Representations , Yolanda Gil, Varun Ratnakar, and Ewa Deelman. International Provenance Annotation Workshop (IPAW-06), Chicago, IL, May 3-5 2006.

 

Providing a semantic-rich foundation for workflow provenance capture and discovery. As part of this work, we participated in the Provenance Challenge--our entry can be found here. The following paper describes our approach:  Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, Varun Ratnakar. “Provenance Trails in the Wings/Pegasus System”,  To appear in Concurrency and Computation: Practice and Experience, Special Issue on the First Provenance Challenge. 2007

 

Formalizing the workflow lifecycle. We also started to formalize the workflow generation and execution process and identified the challenges ahead. This was described in an invited paper to the Scientific Workflows and Business Workflow Standards in e-Science in Amsterdam, The Netherlands, December 2006: Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges, Ewa Deelman, Yolanda Gil.

 

This work is supported by the National Science Foundation under grant number SCI--0455361