University of Southern California

Cross-source Information Extraction and Knowledge Base Population

When:
Monday, February 11, 2013, 11:00 am - 12:00 pm
Where:
ISI- 11th fl Large Conference room 1135
Speaker:
Prof. Heng Ji (CUNY)
Description:

 

AI SEMINAR

Abstract:

   Information Extraction (IE) is a task of identifying “facts”, such as the attack/arrest events, people's jobs, people's whereabouts, merger and acquisition activity from news, patient diagnosis history from discharge summaries and experiment chains from scientific papers. Traditional IE techniques assess the ability to extract information from individual documents in isolation. However, users need to gather information which may be scattered among a variety of sources. These facts may be redundant, complementary, incorrect or ambiguously worded. Furthermore, the extracted information from a document may need to augment an existing Knowledge Base (KB). This requires the ability to link events, entities and associated relations in a document to KB entries and thus present many unique challenges. In this talk, I define several new extensions to state-of-the-art IE and systematically present the foundation, methodologies, algorithms, and implementations needed for more accurate, coherent, complete, concise, and most importantly, dynamic and resilient extraction capabilities.

   More specifically, my talk aims to answer the following questions:
   - Cross-document IE: how to extract and track various events involving important entities over time?  I will present cross-source inference methods to reduce uncertainty, cross-document coreference resolution techniques to reduce redundancy, and cross-document temporal information tracking to enhance coherence and populate knowledge bases.
   - Cross-lingual IE: how to translate the extracted facts into another language accurately? I will present an information-aware machine translation (MT) framework.
   - Cross-genre IE: how to adapt the methods from one formal genre to the other informal genre (e.g. tweets and discussion forums)? I will present a novel cross-genre propagation framework that combines Natural Language Processing and social cognitive theories.
   - Cross-media IE: How to discover and fuse morphed and implicit information from noisy data in multiple data modalities (e.g. text, speech, image and video)? I will present a new structured representation called Multimedia Information Networks.

Bio:

Heng Ji is an assistant professor in Computer Science at Queens College, and a doctoral faculty member in the Computer Science Department and Linguistics Department at the Graduate Center of City University of New York. She received her Ph.D. in Computer Science from New York University in 2007. Her research interests focus on Natural Language Processing, especially on Cross-source Information Extraction and Knowledge Base Population. She has published over 90 papers. Her recent work on uncertainty reduction for Information Extraction was invited for publication in the Centennial Year Celebration of IEEE Proceedings. She received a Google Research Award in 2009, NSF CAREER award in 2010, Sloan Junior Faculty award and IBM Watson Faculty award in 2012. She served as the coordinator of the NIST TAC Knowledge Base Population task in 2010 and 2011, the Information Extraction area chair of NAACL-HLT2012 and ACL2013 and the co-leader of the information fusion task of ARL NS-CTA program in 2011 and 2012. Her research has been funded by NSF, ARL, DARPA, Google and IBM.

View Event Calendar »