Alexander Hauptmann
Carnegie Mellon University
http://www.cs.cmu.edu/afs/cs/user/alex/www/HomePage.html
Michael Witbrock
Carnegie Mellon University
http://www.cs.cmu.edu/~mjw/
"Integrating Speech, Natural Language, Image Processing and Information Retrieval: The Informedia Digital Video Library Project"
3/27/1997: [time not recorded]
[location not recorded]
Abstract: The Informedia Digital Video Library Project at Carnegie Mellon
University is creating large digital libraries of video and audio data
available for full content retrieval by integrating natural language
understanding, image processing, speech recognition and information
retrieval. These digital video libraries allow users to explore
multi-media data in depth as well as in breadth. The Informedia
system automatically processes and indexes video and audio sources and
allows selective retrieval of short video segments based on spoken
queries. Interactive queries allow the user to retrieve stories of
interest from all the sources that contained segments on a particular
topic. Informedia will display representative icons for relevant
segments, allowing the user to select interesting video paragraphs for
playback.
Speech recognition is a key component, together with language
processing, image processing and information retrieval. During the
Informedia library creation, speech recognition helps create
time-aligned transcripts of spoken words as well as integrate
closed-captioned text if available. During library exploration by a
user, speech recognition allows the user to query the system by voice,
making the interaction simpler, more direct and immediate. Carnegie
Mellon's Sphinx-II large vocabulary continuous speech recognition
system provides the foundation for this PC-based application.
Natural language processing is needed to segment the data into
paragraphs. In addition, natural language processing is used for the
creation of summaries used for titles and video "skims", as well as
for aspects of information retrieval.
Image processing identifies scene breaks, and creates representative
key frames for each scene as well as each video paragraph. In
addition, image understanding technologies allow the user to search
for similar images in the database.
Information retrieval is used to allow retrieval of all text data,
either from text transcripts, speech-recognition generated
transcripts, OCR or human annotations.
The dramatic benefits of Informedia allow users to efficiently
navigate the complex information space of video data, without time
consuming linear access constraints. Thus Informedia provides a new
dimension in information access to video, audio and text material.
Last updated: Mon Jun 19 17:44:06 2006
 |