Replica Management in Data Intensive Distributed Science Applications

TitleReplica Management in Data Intensive Distributed Science Applications
Publication TypeBook Chapter
Year of Publication2012
AuthorsA. Chervenak, and R. Schuler
Secondary AuthorsT. Kosar
Book TitleData Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management
PublisherIGI Global
ISBN Number9781615209712

Management of the large data sets produced by data-intensive scientific applications is complicated by the fact that participating institutions are often geographically distributed and separated by distinct administrative domains. A key data management problem in these distributed collaborations has been the creation and maintenance of replicated data sets. This chapter provides an overview of replica management schemes used in large, data-intensive, distributed scientific collaborations. Early replica management strategies focused on the development of robust, highly scalable catalogs for maintaining replica locations. In recent years, more sophisticated, application-specific replica management systems have been developed to support the requirements of scientific Virtual Organizations. These systems have motivated interest in application-independent, policy-driven schemes for replica management that can be tailored to meet the performance and reliability requirements of a range of scientific collaborations. The authors discuss the data replication solutions to meet the challenges associated with increasingly large data sets and the requirement to run data analysis at geographically distributed sites.