Grid Projects and Collaborations

I am involved in a number of scientific collaborations that use Globus services to manage their large data sets.

Scidac2: Scaling the Earth System Grid to Petascale Data (ESG)
DOE Scidac Project Description: Sharing a World of Data
Project Summary
Earth System Grid project web site
The Earth System Grid: Supporting the Next Generation of Climate Modeling Research, D. Bernholdt, et al., Proceedings of the IEEE, vol. 93, 3, pp. 485- 495, March 2005.

The Earth System Grid project is a collaboration whose goal is to provide middleware infrastructure to support the next generation of climate modeling research. ESG has existed for more than five years, with the first phase of the project providing portals at the National Center for Atmospheric Research and Lawrence Livermore National Laboratory that make key climate modeling data sets available to scientists for evaluation. These climate data sets include results from the Parallel Climate Model (PCM), the Community Climate System Model (CCSM) and the Intergovernmental Panel on Climate Change (IPCC). Under the Scidac2 program, ESG has received additional funding to provide expanded and distributed infrastructure, in particular to support the increasing numbers of sites running IPCC climate simulations and the increasing size of climate data sets. My role in ESG is to work on the new distributed data architecture, including the use of Globus data services such as RLS and GridFTP as well as  the design of a distributed metadata catalog. I also lead efforts to use Globus Monitoring and Discovery System to monitor the state of the distributed ESG infrastructure.

Scidac2: Center for Enabling Distributed Petascale Science (CEDPS)
DOE Scidac Project Description: Getting the Science Out of the Data
CEDPS project web site

On the CEDPS project, I lead the effort on the development of data services. This work spans multiple institutions. Our work at ISI is focused on Data Placement services. The goal is to allow Virtual Organizations to specify high-level policies that specify how their data should be placed and replicated in their distributed environment. Placement Services are responsible for carrying out these specified policies. Other data services work in CEDPS includes the Managed Object Placement Service, an extension of earlier work on GridFTP, NeST, and dCache that will be carried out be groups at ANL, U. Wisconsin and Fermi Labs. In addition to data services, the CEDPS project also includes major efforts on scalable services and troubleshooting.

Community Driven Improvement of Globus Software (CDIGS)
NSF OCI: Collaborative Research: Community Driven Improvement of Globus Software
CDIGS project web site

The CDIGS project funds the ongoing development and support of Globus services, with an emphasis on allowing the scientific community to drive the development of new functionality based on the needs of their applications. CDIGS funds basic support for our work on the Globus Replica Location Service and the Data Replication Service. Outreach to communities is an important aspect of the CDIGS project, with the goal of understanding their needs and letting these needs drive both short-term and long-term


Collaborations:

In addition to the projects already described, we work closely with several application groups.

Laser Interferometer Gravitational Wave Observatory (LIGO) Project

The LIGO project uses the Globus Replica Location Service as part of its production Grid infrastructure. LIGO deploys the RLS at ten sites and stores mappings from more that eleven million logical file names to more than 120 million physical replicas of files. We also worked closely with Scott Koranda and Brian Moe of the LIGO team in the design of the Globus Data Replication Service, which is modeled after some of the functionality that they developed for the LIGO Lightweight Data Replicator System (LDR).

Linked Environments for Atmospheric Discovery (LEAD) Project

The LEAD project uses the Globus Replica Location Service and the Data Replication Service to manage data replication among multiple LEAD sites.

Geosciences Network (GEON) Project

The GEON project also has used the Globus Replica Location Service and the Data Replication Service to manage data replication among multiple GEON sites.