Networking and Digital Libraries Frequently-Asked Questions

Joe Touch

This document is IN PROGRESS

-----------

Index of questions

-----------

What are the bandwidth requirements for DLs?

The bandwidth requirements for digital libraries (DLs) are a function of the size of the objects, the bandwidth of the network connection, and the latency between a request and the presentation of the response.

(this assumes that DLs are treated as interactive distributed information access applications, see below).

We can compute the bandwidth required to provide an object in a given latency. The latency can be described by a 'budget', which, for interactive access, is approximately 100 ms between the request and response. The object need only be approximately one 'screen-full', and the size can be estimated based on the quality of its data. I.e., text-only data is about 3-5 Kilobytes (KB), HTML text with small icons are about 20 KB, etc.

We have computed the relationship between object size, latency budget, and bandwidth requirements, including some conclusions of this relationship.

-----------

What latency trade-offs are involved?

DLs can be considered one of a class of "interactive distributed information access" (I-DIA) systems (this acronym could easily be InDIA). I-DIA requirements are based on the assumption that responses must occur within 100 ms of a request to be considered interactive to humans. There are two components to the trade-off - quality vs. latency, and response-time vs. bandwidth, the latter described above.

The quality vs. latency trade-off may be most useful as a user-configurable parameter. The "conventional wisdom" on this is split - I (J. Touch) believe quality will be sacrificed for speed, whereas others at the DLI meeting (notably T. Smith, UCSB) believe quality is of primary importance. An experiment is clearly indicated.

-----------

What existing protocols support DLs?

There are a few protocols that are relevant to the DL community, notably: Multicasting provides a mechanism for distributing data to a set of recipients without addiional server load. Packets are replicated inside the network, rather than at the source. There are several Internet RFCs (Request for Comments) concerning multicast, notably the description of the multicast-IP protocol. Multicast IP is the basis of the MBone, or multicast backbone.

Transaction-TCP (T/TCP) is a reliable, transaction-oriented protocol. T/TCP provides the reliability of the stream-oriented TCP protocol, and the packet-boundaries of the unreliable UDP protocol. It caches connection state across multiple transactions, including congestion-control information. Further information on TrTCP is provided in the Internet RFCs on:

-----------

What prior-work in caching applies?

There is a wealth of prior art in caching and replicated file systems that applies to DLs. A very few are listed here. We also keep of list of general web-accelleration techniques, of which file caching is one.

-----------

What are the expected network applications?

There are several expected applications that may dominate the network traffic. The question of dominant applications affects the design of DLs, and the design of networks to support DLs, because DL traffic will interoperate with this traffic, or in fact become the dominant traffic as a result of these apps.

The question of "dominant traffic" needs qualification. In what respect is the traffic dominant:

Conventional wisdom is that mail dominates connections, because some legacy mail systems open a connection-per-message. Packets may be dominated by network control messages in some systems, but are usually related to the dominant data. The dominant data in the Internet until recently was FTP traffic (file transfer), but recently the Web surpassed FTP. Router processing is dominated by exceptions to common traffic, i.e., multicast traffic load, rather than necessarily by packet processing. When the mbone was configured using IP source routes rather than tunnels to interconnect multicast-islands, source route processing became a dominant load. Server processing load is dominated by context switching costs, among multiple connections.

There are three applications that are expected to dominated the Internet:

We believe that Web traffic will be dominant, and that explicitly-transferred messages (e-mail) will be less common as information organization becomes more standard. Consider the telephone directory services - the better organized the phone book is, the less likely dialed services are used. We also believe the Web is a member of a general class of I-DIA applciations, given sufficient augmentation.

The current web is characterized as a system that is:

If we relax some of these constraints, we may see the broader class of I-DIA applications:

-----------

What objects might be used on a network?

The object model impacts the organization of data in a DL. The use of networks can affect the view of the objects, as well. Current network objects (NOs) aren't simple static objects. NOs are: Here we present a rudimentary object taxonomy from a networker's perspective. (this is a personal view - J. Touch).

NOs can be classified as either:

  1. central or distributed data (source)
  2. central or distributed client (receiver)
  3. permanent or ephemeral
  4. transaction or stream
  5. object precedes request or request precedes object (precedence)
  6. universal data or context-sensitive data For example, using only the first 4 dimensions of the taxonomy:
    central, perm, transact, obj first = book
       "      "       "      req first = pending book request
       "      "    stream,   obj first = movie
       "      "       "      req first = pending movie request
       "     ephm, transact, obj first = CV's, *today's* paper, web
       "      "       "      req first = subscription
       "      "    stream,   obj first = bcast by item
       "      "       "      req first = scheduled bcasts (by time)
    
    

    -----------

    What is Dartnet?

    See the presentation from the DLI '95 meeting.

    -----------

    Last modified Feb. 8, 1996.

    This page written and maintained by Joe Touch touch@isi.edu