| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document describes the KOJAK Group Finder which is a scalable, hybrid logic-based/statistical group detection system that can detect and extend groups in large evidence databases. For additional information on the KOJAK Group Finder see our IAAI-2004 and IA-2005 papers included in the `doc' directory of the distribution or available on the web at http://www.isi.edu/~hans/publications.html. For additional documentation on the PowerLoom knowledge representation & reasoning system underlying the logic-based portion of the Group Finder please refer to http://www.isi.edu/isd/LOOM/PowerLoom/.
The Group Finder is part of a larger hybrid link discovery system called KOJAK that we are currently developing. It combines state-of-the-art knowledge representation and reasoning (KR&R) technology with statistical clustering and analysis techniques from the area of data mining. Using KR&R technology allows us to represent extracted evidence at very high fidelity, build and utilize high quality and reusable ontologies and domain theories, have a natural means to represent abstraction and meta-knowledge such as the interestingness of certain relations, and leverage sophisticated reasoning algorithms to uncover implicit semantic connections. Using data or knowledge mining technology allows us to uncover hidden relationships not explicitly represented in the data or findable by logical inference, use clustering techniques to find sets of entities that are temporally or organizationally related, or use clustering to improve efficiency by carving up a very large data space into smaller more manageable pieces.
2.1 The Group Detection Problem
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A major problem in the area of link discovery is the discovery of hidden organizational structure such as groups and their members. There are of course many organizations and groups visible and detectable in real world data, but we are usually only interested in detecting certain types of groups such as organized crime rings, terrorist groups, etc. Group detection can be further broken down into (1) discovering hidden members of known groups (or group extension) and (2) identifying completely unknown groups.
A known group (e.g., a terrorist group such as the RAF) is identified by a given name and a set of known members. The problem then is to discover potential additional hidden members of such a group given evidence of communication events, business transactions, familial relationships, etc. For unknown groups neither name nor known members are available. All we know are certain suspicious individuals ("bad guys") in the database and their connection to certain events of interest. The main task here is to identify additional suspicious individuals and cluster them appropriately to hypothesize new real-world groups, e.g., a new money laundering ring. While our techniques address both questions, we believe group extension to be the more common and important problem.
The Group Finder is capable of finding hidden groups and group members in large evidence databases. Our group finding approach addresses a variety of important LD challenges, such as being able to exploit heterogeneous and structurally rich evidence, handling the connectivity curse, noise and corruption as well as the capability to scale up to very large, realistic data sets. The first version of the KOJAK Group Finder has been successfully tested and evaluated on a variety of synthetic datasets.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |