The Replica Location Service
The replica location service (RLS) maintains
and provides access to mapping information from logical names for
data items to target names.
The Replica Location Service was co-developed by the Globus team and Work Package 2 of
the DataGrid project. The RLS prototype is currently available as an
alpha
release for testing and evaluation. The distributed RLS is intended
to replace the centralized Globus replica catalog available in
earlier releases of GT2.x. The distributed RLS provides
higher performance, reliability and scalability.
Replication of data items can reduce access latency,
improve data locality, and increase robustness, scalability and performance
for distributed applications. An RLS typically does not operate in
isolation, but functions as one component of a data grid architecture.
(Other components include services that provide reliable file transfers,
metadata management, reliable replication and workflow management.)
Our prototype RLS
implementation is based on the following mechanisms:
-
Consistent local state maintained in
Local Replica Catalogs (LRCs). Local catalogs maintain mappings
between arbitrary logical file names (LFNs) and the physical file names (PFNs)
associated with those LFNs on its storage system(s).
-
Collective state with relaxed consistency
maintained in Replica Location Indices (RLIs). Each RLI contains a set
of mappings from LFNs to LRCs. A variety of index structures can be
defined with different performance characteristics, simply by varying the
number of RLIs and amount of redundancy and partitioning among the RLIs.
-
Soft state maintenance of RLI state.
LRCs send information about their state to RLIs using soft state
protocols. State information in RLIs times out and must be periodically
refreshed by soft state updates.
Compression of state updates. This optional compression uses
Bloom Filters to summarize the content of a Local Replica Catalog
before sending a soft state update to a Replica Location Index Node.
-
Membership and partitioning information maintenance. The
current RLS implementation maintains static information about the LRCs
and RLIs participating in the distributed system. As new
implementations of the RLS are developed, they will use OGSA
mechanisms for registration of services and for service lifetime
management.
Relationship to Earlier Globus Replica Management Software
The Replica Location Service is intended
to replace replica management tools available in earlier versions
of the Globus Toolkit, incuding the Replica Catalog API and the
Replica Management API. The RLS differs from these earlier
components in several important ways.
As a distributed system, the RLS is designed to provide greater
reliability by avoiding single points of failure as well as providing better
load balancing, performance and scalability. The RLS implementation is based
on open source relational database technology. Finally, the RLS strictly
separates replication information from other types of metadata; unlike the
original replica catalog, the RLS does not include information about logical
collections, but assumes such information is stored in a separate metadata
service.