Multimedia Human-Computer Interaction


Objectives

One of the core issues in multimedia presentations is the question of information-to-medium allocation: which information should be apportioned to which display medium? And when the information has been presented, how should the user's multimedia response be reconstituted into a single integrated message? In an ongoing study, characteristics of information, media, and modalities are being analyzed and dynamic allocation and information integration algorithms being developed.


Approach

As expert systems and other computer systems become more sophisticated, system users require more and increasingly complex information, and they have to be able to process this information in shorter and shorter spans of time. Simultaneously, novel media with new display capabilities are continuously being created. However, the best utilization of such media to handle the complex information in order to facilitate communication becomes an increasingly difficult problem for the system builder.

The best way to address this problem is to develop sophisticated interfaces between the systems and the users: interfaces that in some sense understand what they are presenting as well as the environment within which the presentation is being created so as to be able to display just what is pertinent as clearly as possible, and that understand what is being said to them and the context within which it is being said so as to be able to accept information and integrate it from a variety of sources without undue effort on the part of the user. In addition, just like any other system, interface displays vary from one location to the next or may even vary in the same location as time passes (due, for example, to component failures). Therefore, an intelligent interface must be able to dynamically adapt to variations in its display resources by interactively planning its presentations.

These design constraints -- the need to understand what is being presented and input, and the need to support portability and variability of display resources -- impose a certain architecture almost by necessity. This architecture includes semantic models of the task, the media, the information, and more, as well as certain planning and information integration capabilities.

To handle the needs of dynamic and adaptive presentation planning, the system requires two linked co-reacting planners that coordinate the actions of the systemÕs media and information sources. Together with an information integrator, they construct the Discourse Structure and the Presentation Structure that record the ongoing human-computer interaction and ensure cross-turn and cross-media coherence. As described in [Arens et al. 93b], the capabilities of each medium, the current task, and the contents of each information source are respectively semantically modeled as Virtual Devices (abstract descriptions of device I/O capabilities), task models, and abstract information types.

A multimedia human-computer communications manager called CICERO has been defined based on these ideas [Arens and Hovy 95]. Prototypes of parts of CICERO have been built [Arens et al. 93a]. The presentation planning and dialogue management approach is based on generalizations of text planning systems and on their extensions to the semantics of text formatting devices [Hovy and Arens 91].


Project Members

Eduard Hovy, co-project leader

Yigal Arens, co-project leader


Recent Publications

Arens, Y. and E.H. Hovy. 1995.

The Design of a Model-Based Multimedia Interaction Manager. AI Review 8(3) Special Issue on Natural Language and Vision.

Get paper in PostScript.

We describe the conceptual design of Cicero, an application-independent human-computer interaction manager that performs run-time media coordination and allocation, so as to adapt dynamically to the presentation context; knows what it is presenting, so as to maintain coherent extended human-machine dialogues; and is plug-in compatible with host information resources such as "briefing associate" workstations, expert systems, databases, etc., as well as with multiple media such as natural language, graphics, etc. The system's design calls for two linked mutually activating planners that coordinate the actions of the system's media and information sources. To enable presentational flexibility, the capabilities of each medium and the nature of the contents of each information source are semantically modeled as Virtual Devices -- abstract representations of device I/O capabilities -- and abstract information types respectively in a single uniform knowledge representation framework. These models facilitate extensibility by supporting the specification of new interaction behaviors and the inclusion of new media and information sources.

 

Arens, Y., E.H. Hovy, and S. Van Mulken. 1993a.

Structure and Rules in Automated Multimedia Presentation Planning. Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-93. Chambéry, France.

Get paper in PostScript.

During the planning of multimedia presentations, at least two distinct processes are required: planning the underlying discourse structure (that is, ordering and interrelating the information to be presented) and allocating the media (that is, delimiting the portions to be displayed by each individual medium). The former process has been the topic of several studies in the area of text planning, but numerous questions remain for the latter, including: What is the nature of the allocation process -- what does it start with and what does it produce? What information does it depend on? How should the two processes be performed -- sequentially, interleaved, or simultaneously? In this paper, we define Discourse Structure and Presentation Structure and outline the kinds of information that media allocation rules must depend on, including, centrally, information about the discourse structure. We describe a prototype planning system that performs the information-to-media allocation, arguing that since media allocation rules depend on the characteristics of the information to be presented, they can only be applied once the overall discourse structure has been essentially planned out and the individual portions of information have become apparent.

 

Arens, Y., E.H. Hovy, and M. Vossers. 1993b.

Describing the Presentational Knowledge Underlying Multimedia Instruction Manuals. In M. Maybury (ed), Intelligent Multimedia Interfaces. Cambridge: MIT Press.

Get paper in PostScript.

We address one of the problems at the heart of automated multimedia presentation production and interpretation. The media allocation problem can be stated as follows: how does the producer of a presentation determine which information to allocate to which medium, and how does a perceiver recognize the function of each part as displayed in the presentation and integrate them into a coherent whole? What knowledge is used, and what processes? We describe the four major types of knowledge that play a role in the allocation problem as well as interdependencies that hold among them. We discuss two formalisms that can be used to represent this knowledge and, using examples, describe the kinds of processing required for the media allocation problem.

 

Arens, Y. and E.H. Hovy. 1993c.

The Planning Paradigm Required for Automated Multimedia Presentation Planning. Presented at the AAAI Fall Symposium on Human-Computer Interfaces, Raleigh, NC.

Get paper in PostScript.

In this paper we argue that the planning of multimedia presentations requires at least two distinct (though interrelated) independent reactive planning processes: one to plan the underlying discourse structure (that is, to order and interrelate the information to be presented) and the other to allocate the media (that is, to delimit the portions to be displayed by each individual medium). The former process has been the topic of several studies in the area of automated text planning, in which the traditional methods of constructing tree-like plans in deliberative, top-down, planning mode have been applied with varying amounts of success. The latter process remains less clear, in part (we believe) because the deliberative planning mode is even less appropriate for it. We outline in this paper the reasoning behind our belief that neither planning process can be a simple deliberative top-down one and describe the kind of interplay between the two processes.

 

Hovy, E.H. and Y. Arens. 1991.

Automatic Generation of Formatted Text. Proceedings of the 8th AAAI Conference. Anaheim, CA.

Get paper in PostScript.

Very few texts longer than a paragraph are written without appropriate formatting. To ensure readability, automated text generation programs must not only plan and generate their texts but be able to format them appropriately as well. We describe how work on the automated planning of multisentence text and on the display of information in a multimedia system led to the insight that text formatting devices such as footnotes, italicized regions, enumerations, etc., can be planned automatically by a text structure planning process. This is achieved by recognizing that each formatting device fulfills a specific communicative function in a text, and that such functions can be defined in terms of the text structure relations used as plans in a text planning system. An example is presented in which a text is planned from a semantic representation to a final form that includes English sentences and LaTeX formatting commands, intermingled as appropriate.


NLG overview | Project Members | Projects| Demonstrations | Publications