Jafar Adibi
USC/ISI
http://www.isi.edu/~adibi
"Finding Groups of Related Individuals via a Mutual Information Model"
10/17/2003: 10:30am - 12:00pm
11th Floor Large Conference Room
Abstract: Link discovery is a new challenge in data mining. Its primary
concerns are to identify strong links and discover hidden
relationships among entities and organizations based on low-level,
incomplete and noisy evidence data, and to infer plans and activities
of interest they might be involved in. During the past two years we
developed a hybrid link discovery system called KOJAK that combines
state-of-the-art knowledge representation and reasoning technology
with statistical clustering and analysis techniques from the area of
data mining. In this talk, I will focus on some of the statistical
reasoning modules of KOJAK. I will briefly introduce the link
discovery challenge problems as posed as part of DARPA's EELD program
and also illustrate some characteristics of real-world databases as
they might be encountered by a deployed link discovery system.
One specific type of link discovery is group detection, that is, the
problem of identifying groups of related individuals in large evidence
databases. To solve that we developed a new "Group Finder" module that
uses a novel mutual information approach to identify strong links
between entities of interest (e.g., groups of "bad guys"). The Group
Finder uses a noisy channel model to handle noise, corruption and
incompleteness of evidence and a temporal mutual information model to
handle dynamic, streaming data. I will also show how we can exploit
graph entropy to identify group leaders, and how we can perform object
consolidation (record linking) by using a model of agent behavior.
Our Group Finder module had the best combined score during this year's
EELD challenge problem evaluation, and I will describe experimental
results from several of the synthetic datasets used during that
evaluation.
Last updated: Mon Jun 19 17:44:06 2006
 |