to ISI Home Page
isd home
About ISD
education at isd
employment
environment
news
people
research
AI Seminars
div3admin

environment
Alexander Hauptmann
Carnegie Mellon University
http://www.cs.cmu.edu/afs/cs/user/alex/www/HomePage.html

Michael Witbrock
Carnegie Mellon University
http://www.cs.cmu.edu/~mjw/


"Integrating Speech, Natural Language, Image Processing and Information Retrieval: The Informedia Digital Video Library Project"

3/27/1997: [time not recorded]
[location not recorded]

Abstract: The Informedia Digital Video Library Project at Carnegie Mellon University is creating large digital libraries of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. These digital video libraries allow users to explore multi-media data in depth as well as in breadth. The Informedia system automatically processes and indexes video and audio sources and allows selective retrieval of short video segments based on spoken queries. Interactive queries allow the user to retrieve stories of interest from all the sources that contained segments on a particular topic. Informedia will display representative icons for relevant segments, allowing the user to select interesting video paragraphs for playback. Speech recognition is a key component, together with language processing, image processing and information retrieval. During the Informedia library creation, speech recognition helps create time-aligned transcripts of spoken words as well as integrate closed-captioned text if available. During library exploration by a user, speech recognition allows the user to query the system by voice, making the interaction simpler, more direct and immediate. Carnegie Mellon's Sphinx-II large vocabulary continuous speech recognition system provides the foundation for this PC-based application. Natural language processing is needed to segment the data into paragraphs. In addition, natural language processing is used for the creation of summaries used for titles and video "skims", as well as for aspects of information retrieval. Image processing identifies scene breaks, and creates representative key frames for each scene as well as each video paragraph. In addition, image understanding technologies allow the user to search for similar images in the database. Information retrieval is used to allow retrieval of all text data, either from text transcripts, speech-recognition generated transcripts, OCR or human annotations. The dramatic benefits of Informedia allow users to efficiently navigate the complex information space of video data, without time consuming linear access constraints. Thus Informedia provides a new dimension in information access to video, audio and text material.


Last updated: Mon Jun 19 17:44:06 2006

 

 

 

 

 
USC Home Page ISI Home Page