USC / Information Sciences Institute
touch@isi.edu
ABSTRACT: ZAPT is a desktop multiparty audio/video teleconferencing system at ARPA, based on Bellcore's Touring Machine. ZAPT supports one multiparty call (up to 4 parties), and any number of point-to-point calls, with analog multimedia. ZAPT supports automated remote operation and conference "join" requests. ZAPT provides interoffice conferencing at ARPA, and manual cross-connect to Internet IETF-style conferencing tools.
ZAPT's primary goal was high-quality, low cost, multiparty desktop teleconferencing at ARPA. ZAPT uses existing software, to minimize delivery time, and also uses existing hardware at ARPA, specifically the NeXT systems and analog office cabling. Security was also a goal, and interoffice analog wiring was deemed sufficiently secure.
TM contains two user interfaces: MTS (low-level) and CRUISER (high-level), implemented in X11, on Sun-3 desktop computers, with independent analog audio and video cabling. MTS was chosen because it was more portable, as we will discuss later. The TM manages the analog connections using switching and bridging equipment residual from prior testbeds at Bellcore. The ZAPT system is shown in Figure 1, and its architecture is dominated by the TM components. The TM provides an analog local distribution system, which ZAPT augments with back-end connections to other conferencing systems via the Internet.

In the TM, existing IP transmissions (via Ethernet) are used for control, and an analog crossbar provides audio and video switching. At Bellcore, Sun-3's are used to control separate user-end analog equipment (camera, monitor, microphone, speaker). At ARPA, the analog video monitor is provided by the NeXT via the NeXTDimension board, using the NeXTtv application to display real-time NTSC video on the color monitor (Figure 2). At both Bellcore and ARPA, analog signals are carried by a distinct cabling network, and digital and analog interact only at the crossbar switch. The bridging at ARPA is completely passive (analog signal independent mixing only), whereas Bellcore's is active (includes signal-dependant mixing as well as switching).



Analog video is displayed on the NeXT via NeXTtv, a demo application provided with NeXTDimension boards. A sample video window, showing quadrant-split multiparty teleconferencing, is shown in Figure 5. NeXTtv comes up "off" - users must manually "turn on" the tv, using the "power" button in the lower left corner.

MTS and SMGR are TM X-windows user interfaces that run directly on the NeXT, under co-Xist (a NeXT application that provides X-windows compatibility). MTS is a Bellcore TM client application for conferencing control designed for computer scientists. Bellcore's Cruiser client, designed for casual users, was not used because it relies on X-windows extensions not available under co-Xist. Figure 6 shows a sample of the MTS user interface and SMGR endpoint control tool, both of which are unchanged from Bellcore's Sun-3 version, except for the NeXT-style window manager "iconize" and "close" buttons (see Note).
The MTS interface provides call initiation and response, as well as multiple call management. A user selects the called parties from a static user list, and initiates a call. Calls can be accepted or refused by the user, or SMGR can auto-accept (discussed below). Calls in progress can be suspended, resumed, or brought to the foreground (in the case of multiple calls in progress). An individual call can have its audio in, audio out, video in, or video out individually activated or deactivated on a per-user basis. A call can also be in "split screen" mode, or point-to-point mode. Calls to more than one other party automatically initiate split-screen mode. Up to 3 other parties can be part of a single call (for a total of 4, including the caller). Only one multi-party call can be in session at a time in this installation of ZAPT, although any number of point-to-point calls can occur simultaneously. In addition, individual users can be added or deleted from a current call, at the discretion of any current member of the call.
In addition to MTS, each station has a control interface, called SMGR, also shown in Figure 6. A station is defined as a set of endpoints, shared among the user client processes. SMGR is used to change the default policy for session requests, from manual, to auto-accept, auto-deny, and pass-through (not used for MTS). It also shows the status of the endpoints of a station.

3.3.1 Bridging
Bellcore uses an (expensive) active bridging system as part of their TM installation. Bellcore provided the TM-side of this software, which was modified for use with passive bridging equipment in ZAPT. The video bridge at both ZAPT and Bellcore was passive, and we configured an inexpensive TEAC Tascam studio mixer to provide passive audio bridging in ZAPT as well. Bridging reservation requests are satisfied on a first-come first-served basis.
Active bridging is signal-dependant mixing; in Bellcore's case, the mixed signal output is a "loudest 2" normalized summation of the inputs. In addition, Bellcore's bridging system performs some switching as well. Passive bridging is signal-independent mixing; in ZAPT, each output is an "all but me" sum of the inputs (regardless of signal on those inputs). One advantage of ZAPT's approach is its ability to correctly compose multiple bridges. To compare the two, consider 4 sites in conference, W and X near bridge A and Y and Z near bridge B. Bridge A unites W, X, and the output of B. In Bellcore's case, W hears 0.5 X + 0.25 Y + 0.25 Z, so the remote participants are not as loud. In ZAPT, W hears X + Y + Z, as it would in a conference room, removing only direct feedback. "All but me" bridging is limited to small groups, because background noise is added.
ZAPT is the only bridging TM outside of Bellcore itself. Bellcore's bridge manager software component of the TM performs trunk reordering to undo some of the switching done by their active bridge. ZAPT's bridging software relies on the TM resource allocation mechanism to preserve trunk order, an assumption that may not be valid for multinode switching using Bellcore's version of resource management. Bellcore has reincorporated the ZAPT modifications into their software, and is examining its implications on resource management in the TM.
3.3.2 Switching
The ZAPT switching control software is adapted from code MIT developed for use with their TM installation. ZAPT slows the command stream down with fixed delays to prevent serial line overrun, and verifies all serial port writes with echo-reads, resulting in more reliable operation and fault attribution. Diagnosis of this problem was facilitated by the development of SoftPanel, a curses-based (terminal-type independent full-screen ASCII) tool for control of the switch using an ASCII terminal (Appendix C). SoftPanel has been given to the crossbar switch manufacturer, as well as made available on the Internet via anonymous FTP.
3.3.3 Cross-coupling
ZAPT can interconnect with Internet teleconferencing via manual cross-coupling (see Note). Side-effect cross-coupling uses an automated TM client and other Internet teleconferencing tools independently. The design is the equivalent of a "null modem" cable (Figure 7). These connections require manual connect of each side of the conference, i.e., a rendezvous at a proxy. Users never receive calls initiated from the proxies, in this case. As far as the TM clients are aware, they have connected to an office called "external". As far as the external client (e.g., MMCC) is aware, the local client has switched the audio and video to a different conference room.
Fully automated cross-coupling would avoid separate control actions on the two conferencing systems. A user on TM calling an external user would result in a call to the proxy, and the proxy would complete the link by initiating an external call to the external user. This requires a connection protocol that is compatible with both TM and the external system semantics. We designed an interoperation protocol for this purpose, which was equivalent to circumnavigating TM entirely, and thus not implemented (Appendix A). A proxy needs to emulate the protocol at the user client level. The client protocol for external Internet teleconferencing tools were documented (e.g., MMCC [Sc92a] [Sc93]), the TM was not. We developed a packet prober to determine the TM protocol, and developed an extension to it for proxy operation (Appendix A). We also developed a protocol for communication between the TM and MMCC clients, equivalent to a 2-party connection protocol (Appendix A).

ZAPT uses the MTS client, in conjunction with an SMGR client modified to auto-accept, to support call management via side-effect cross-coupling. Side-effect cross coupling is the link formed when two teleconferencing systems call a "null-modem" - the link is a side effect of the call, not due to connection status crossing the link boundary. This supports only the conventional TM call model, where only existing call members can add additional parties to a current call.
ZAPT also supports radio-like broadcasts, in which users join independently via the Radio proxy application. The Radio proxy is an application that replaces MTS. The Radio responds to call requests to add that user to the current broadcast session (see Note). The SMGR client was modified to avoid the X-windows interface, to permit automated operation. We used the Bubble trace program (Appendix C) to determine a protocol for the Radio proxy. The Radio proxy was not released in ZAPT, because the protocol was optimistic only (thus not fault tolerant). Further detail of the Radio proxy appears in Appendix A.
We also attempted to augment the Radio proxy to perform both broadcast and point-to-point direct calls. The proxy was to perform dual registration, as Radio and MTS clients; calling the Radio would results in a callback join to a broadcast session, calling the MTS proxy would result in point-to-point auto-accept call operation. The Radio and MTS proxy had to be implemented in a single module, because they share state. When the Radio had a broadcast session active, the MTS proxy should respond "busy" to all requests, and vice-versus. The dual client was not completed, because of the lack of fault tolerance, as noted above.
3.3.4 Failure detection and recovery
ZAPT augments the TM model of system fault-tolerance, because ARPA requires the privacy of fail-stop operation. Failure of the control system should result in failure of the audio and video links; otherwise, transmissions continue without explicit action. Individual components of the TM implement some simple fault tolerance, but due to a lack of soft-resets the set of components is not tolerant of individual component failure. TM's failure mode also keeps a connection up when components fail (fail-stay), rather than keeping them down (fail-stop) as ARPA requires. The TM architecture did not permit a fail-stop design, in general, but a specific version of fail-stop, forcing entire system reboot, was possible. ZAPT also supports self-restart in the event of failure.
We designed SafeNet to manage fault tolerance, and incorporated it into the ZAPT system. SafeNet is a process that monitors the components of the TM, and spawns the failed components in the proper order, either at the system or user level (Figure 8).
The TM consists of a core, composed of 7 independent components, and user sites, each composed of 5 components. TM core state information is kept when the core fails, but can't be reset or resynchronized once the TM is running. TM core state information needs to be reset to some known state, but the TM user clients aren't aware of that state. As a result, user clients can fail and restart while the core is running, but core failure requires user clients restart after the core restarts. Ideally, the core SafeNet would signal the user SafeNets to restart after it rebuilds the core. In the current system, a user attempting to initiate a call after a core restart is told he doesn't exist, and must infer his own restart action.

The core system configuration in Figure 8 indicates that when any system component fails, the entire system is restarted. The user configuration in Figure 8 indicates that the user components exhibit a restart structure, i.e., that MTS can be restarted without restarting other components, but that STATION failure requires restarting SMGR and MTS. The SafeNet software is also integrated into the ZAPT Manager NeXT-style application.

ZAPT also provides fault-tolerance via module restart, and inexpensive passive bridging. What we learned about fault tolerance and passive bridging, as well as about the TM model in general, has been shared with the research community. Our crossbar debug module has been made available on the Internet via anonymous FTP, and has been given to XN Technologies (the switch manufacturer).
Finally, the experience of developing and installing ZAPT has influenced our models of multimedia teleconferencing from another viewpoint. We have attempted to begin such a model, called "M3" - the Multicast Multipoint Model [To92]. Our model supports conference merging and subconferencing, neither of which is supported in MMCC or the Touring Machine. It also permits co-mingling of different data stream types, permitting, e.g., audio to go to an oscilloscope whose video goes to a monitor. MMCC and TM, as well as some proposed IETF MMUSIC Working Group models [MM93], currently enforce strict stream partitioning by type. In addition, we recommend a dynamic-state model, in which resources and state have a "free" ground state, and are kept reserved only by continual refresh requests. This permits a true fail-stop system. The physical crossbar should be part of a logical switching system that includes a logical crossbar and logical input and output gates, where session control governs the logical crossbar, and user end-control governs the gates, as noted in the general recommendations. These concepts are being developed to influence both the MMUSIC and SPT models [Sc93].
Video teleconferencing has become more widespread, due to the efforts of the IETF. The MBONE now reaches over 500 hosts world-wide with multicast IP (see Note 1) [Ca92], and casual broadcast conferences are a daily occurrence. This so-called "IETF-style" teleconferencing has become popular, using Sun SPARCs, SGI Indigos, and DEC 5000s. Unfortunately, NeXTs are not among those supported, due in part to limited access to their OS sources (no multicast IP is available - see Note 2 -), and differences in windowing systems (X11 isn't native). The Internet teleconferencing tools were not sufficiently mature when ZAPT began in 1992, but have become pervasive in the past year. As a result, ZAPT should eventually be replaced with the evolving Internet tools.
We also evaluated the NeXT as a general audio/video teleconferencing platform, in consideration of supporting existing Internet digital teleconferencing tools. The NeXT hardware and operating system were evaluated for IP multicasting, continuous digital audio, and continuous digital video capability. Both operating system and hardware were found to be lacking, at this time. There are other competing, proprietary digital teleconferencing systems for the NeXT hardware. Most use the SCSI Digital Eyes video digitization hardware, and provide conferencing only over a local network. The systems use Ethernet broadcast, rather than IP multicast, for data distribution. One system, called "Collaborate" (from Trident Data Systems, as a commercial product), provides audio and video, although with high latency (2 seconds). Another, called "Radio" (from CWI, freeware), provides only audio. None of the systems is capable of wide-area teleconferencing, either spanning multiple LANs or over large latencies, and both use audio and video formats not compatible with Internet video teleconferencing.
Digital packet audio and video on the NeXT were also examined as part of this effort. The Internet currently uses a suite of components that provide loose-style teleconferencing. The results of this evaluation are described in Appendix B. Also, using the Internet for regular teleconferencing requires resource reservation. There are no automatic tools for reservation of DARTnet at this time, although some are pending implementation. DARTnet currently has a manual mechanism for reservation of the switching nodes for protocol experiments, but this does not include the end-system equipment. This service should be included in any future plans to provide operational teleconferencing.
We would also like to thank Eve Schooler and Steve Casner of USC/ISI's MMC teleconferencing project, for support in the side-effect cross coupling, as well as helping ZAPT fit into the larger teleconferencing environment they designed at both USC/ISI and ARPA. We also thank Jon Postel of USC/ISI for his assistance.
Suzanne Woolf and Ray Bates of USC/ISI helped immensely with NeXT-specific issues, as well as with the NeXT-style user interface implementation. We would also like to thank Andrew Heybey and Mark Uhrmacher of MIT, for their switch control module, and help in debugging the hardware.
[Ar93] Arango, M., et. al., "Touring Machine System," Communications of the ACM, Vol. 36, No. 1, January 1993, p.69-77.
[Ca90] Casner, S., Seo, K., Edmond, W., and Topolcic, C., "N-Way Conferencing With Packet Video," Third International Workshop on Packet Video, Morristown NJ, Mar. 1990.
[Ca92] Casner, S., and Deering, S., "First IETF Internet Audiocast," ACM Sigcomm, July 1992. Also available as ISI Reprint Series IS/RS-92-293.
[Fr92] Frederick, R., The nv `network video' tool program Unix `man pages. Nv is available via anonymous FTP from parcftp.parc.xerox.com.
[Ja92] Jacobson, V., The vat `visual audio tool' program Unix `man' pages. Vat is available via anonymous FTP from ftp.ee.lbl.gov.
[MM93] MMUSIC, The IETF MMUSIC Working Group (Multiparty Multimedia Session Control), A. Weinrib and E.M. Schooler, chairs, Proceedings of the Twenty-Seventh Internet Engineering Task Force, Amsterdam the Netherlands, July 1993, pp. 417-430.
[Sc92a] Schooler, E.M., "An Architecture for Multimedia Connection Management," Proc. 4th IEEE ComSoc International Workshop on Multimedia Communications, Monterey CA, Apr. 1992.
[Sc92b] Schulzrinne, H., "Voice Communication Across the Internet: A Network Voice Terminal", Dept. of Elec. Eng. Tech. Report, Univ. of Massachusetts Amherst, July 1992. Also the nevot network audio tool Unix `man' pages. Nevot is available via anonymous FTP from gaia.cs.umass.edu.
[Sc93] Schooler, E.M., "Case Study: Multimedia Conference Control in a Packet-Switched Teleconferencing System," Journal of Internetworking, Vol. 4 No. 2, June 1993, pp. 99-120.
[To92] Touch, J.D., "Multiparty Connections," Internal Memo, USC/ISI, Marina del Rey CA, Nov. 1992.
The TM is described by a set of operations - initialization, database access, call initiation (outgoing), and call indication (incoming), each of which needs to be modified to accommodate proxies. A proxy needs to register each client on whose behalf it acts, in addition to itself, via registerClient (Figure 10). Calls to these clients need to be translated into a call to the proxy; a "Bubble" is inserted between user clients and the central system, to effect these translations (Figure 11). Nameserver replies are filtered by the Bubble, to hide proxies from user clients (Figure 11).


Consider calls from the TM to the remote client (proxy call requests), and calls from the remote client to a user inside the TM (proxy call indications). TM users register via registerClient (Initial). A TM call starts with a user sessionCreate, acknowledged by a sessionRequestReceived message (Create). The TM core sends sessionActionRequest to all members, and collects a sessionActionAccept, sessionActionDenied, or timeout for each (Reply). Eventually, a sessionActionCommit or sessionActionAbort is sent to each member (including the initiator), indicating the result of the call (Accept). This protocol is denoted in Figure 12.

When a TM user performs a call request in the augmented protocol, remote users in the initial sessionCreate are checked in the nameserver by the Bubble, and replaced by the indicated proxy (Create), which has already registered all accessible users a priori (Initial). The request is forwarded to the central system, which sends it to the proxy (Reply). The proxy executes an external protocol to connect to the remote proxy, and accepts or denies the request as indicated. The reply from the proxy is translated in the Bubble at the TM user, who receives the final reply (Reply). See Figure 13 for details.

Call indications in the augmented protocol originate at the remote proxy, by a connection request. The proxy registers the clients of the incoming call, and sends a sessionCreate to the central TM (Create). The rest of the protocol proceeds inside the TM side as before, and finally an acknowledgment is sent to the remote proxy, as in Figure 14 (Accept). Other actions occur as in the proxy-extensions for the call request protocol, as in Figure 13.

There is a separate protocol between the local and remote proxies, representing two-party connection establishment. The state machine and events are show in Figure 15.

A word about notation in these two figures. There are 5 types of messages - connect request (CR), connect accept (CA), connect deny (CD), disconnect request (DR), and disconnect accept (DA). Prefixes indicate the source or sink of the message: U indicates user messages, X denotes external (remote party). For example, a user connection request is UCR. In the state diagram, to indicates a timeout, "*" is no message, and errors are not shown. The states are named connected (CON), disconnected (DIS), user connecting (UCON), user disconnecting (UDIS), external connecting (XCON), external disconnecting (XDIS).
We have described the differences between the TM and automated client protocol, in all other respects they should be equivalent. In particular, the timeout and fault tolerant behavior is not affected by these extensions, but would need to be implemented to provide an environment in which to effect these modifications.
7.0.1 Broadcast client protocol
We wanted to add "radio" service by extending the TM protocol. An automated TM proxy can't broadcast. Many users can auto-connect, but only one session can be active (and choice is under proxy control). Creating a session for each user receiving a broadcast is inefficient.
TM provides a broadcast mode, in which a single source is received by all users, using a "bussing" capability in the analog switch. In TM version 2, users can't join an existing session; they must be "added" by an existing member.
We modified the TM operation by creating a Radio client. All call requests are denied by Radio; such requests are assumed to be a request to join the broadcast (Figure 16). The Radio then adds the caller to its broadcast session. Radio creates the session when the first user joins, and adds users thereafter. It tears down the session when the last user leaves, releasing the proxy resources.
The Radio was designed to be cooperative with a Remote proxy client. The Remote client implemented the automated proxy protocol. Both were part of a single module, so shared state could permit either to acquire the external analog connection.

Figure 16 shows a simplified Radio protocol. Each transition is labelled with received message / internal action / output message. An asterisk ("*") indicates no action or message, and to indicates a timeout. The "trick" shown is to use call requests as implicit join requests. The states show are unregistered, off (waiting for requests), going on, going off, and on (currently broadcasting). The protocol shown is optimistic, because an optimistic client protocol was traceable. A pessimistic protocol, with failure recovery, was not developed because of insufficient information on the TM client protocol.
We attempted to develop a more detailed Radio protocol, to better model the state transitions nd eventually provide fault tolerance. In Figure 16, ON goes to GOING ON with a user call request, which is denied, and a broadcast call initiation. GOING ON should go to another separate state when REQUEST ME is received, i.e., when the TM core asks the client to join the session it initiates.
Further detail involves the fault recovery; a partial elaboration of this state diagram appears in Figure 17. This includes refining the path of transition from OFF to ON to have 2 intermediate steps; this is inferred from the protocol messages observed with the Bubble tracer. We would prefer to have build this diagram from a TM specification of the protocol, but none is documented at this time.
At this point we can see the flaw in the general TM protocol that prohibits fault tolerance in the general case. Although we may be able to determine "abort" or timeout transitions for all other states, the ON state has no timeout possible. The TM assumes fail-stop operation which continues a call, and state machines don't interact while ON. A fault tolerant protocol requires the ON state to have a timeout, i.e., to have a "refresh connection" message to maintain state. This kind of periodic state transition is well known in existing transport protocols (Delta-t, etc.).

The NeXT treats sound differently. The NeXT plays and records sounds via system calls, which queue sound packets, and play them via DMA transfers through the DSP chip. Unfortunately, we have not been able to accomplish even trivial bidirectional continuous digital audio, and unidirectional audio has worked only with significant (1 second) latency. Use of NeXT hardware for sound would require proprietary information on the sound system, which we could not obtain.
We have been able to modify CWI's "Radio" program (not related to the TM application we developed, also called Radio). The modified program receives vat-style packets, and plays them as they arrive. However, even on local networks with continuous packet transmission, playout is sporadic. We have not been able to correct the playout, and expect that correction would not be possible without proprietary information on the NeXT sound system.
Note that the situation is not helped by a move to NeXTStep 486. The 486 system has no sound hardware compatibility definition. The sound interface continues to be via the OS call interface, rather than via a device emulation (/dev/audio). The NeXT has an undocumented /dev/sound.
The main window does not display the "send video" options, because we were not able to port them to the NeXT. The NeXT hardware uses several different realtime video digitization boards; ours use the NeXTDimension board. NeXT does not support the board, and the effort required to perform the port would be large, and unwarranted without digital audio capability, so was not attempted. Another system, Digital Eyes, provides video digitization over SCSI interface, and others are considering its use for digital teleconferencing. The color displayed is slightly green-skewed, and apparently reflects differences in co-Xist's 8-bit pseudocolor map vs. Sun's.

