to ISI Home Page
isd home
About ISD
education at isd
employment
environment
news
people
research
AI Seminars
div3admin

environment
Jafar Adibi
USC/ISI
http://www.isi.edu/~adibi


"Finding Groups of Related Individuals via a Mutual Information Model"

10/17/2003: 10:30am - 12:00pm
11th Floor Large Conference Room

Abstract: Link discovery is a new challenge in data mining. Its primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on low-level, incomplete and noisy evidence data, and to infer plans and activities of interest they might be involved in. During the past two years we developed a hybrid link discovery system called KOJAK that combines state-of-the-art knowledge representation and reasoning technology with statistical clustering and analysis techniques from the area of data mining. In this talk, I will focus on some of the statistical reasoning modules of KOJAK. I will briefly introduce the link discovery challenge problems as posed as part of DARPA's EELD program and also illustrate some characteristics of real-world databases as they might be encountered by a deployed link discovery system. One specific type of link discovery is group detection, that is, the problem of identifying groups of related individuals in large evidence databases. To solve that we developed a new "Group Finder" module that uses a novel mutual information approach to identify strong links between entities of interest (e.g., groups of "bad guys"). The Group Finder uses a noisy channel model to handle noise, corruption and incompleteness of evidence and a temporal mutual information model to handle dynamic, streaming data. I will also show how we can exploit graph entropy to identify group leaders, and how we can perform object consolidation (record linking) by using a model of agent behavior. Our Group Finder module had the best combined score during this year's EELD challenge problem evaluation, and I will describe experimental results from several of the synthetic datasets used during that evaluation.


Last updated: Mon Jun 19 17:44:06 2006

 

 

 

 

 
USC Home Page ISI Home Page