to ISI Home Page
isd home
About ISD
education at isd
employment
environment
news
people
research
AI Seminars
div3admin

environment
David Kempe
University of Southern California
donotspam.dkempe@usc.edu
http://www-rcf.usc.edu/~dkempe/
speaker's picture


"Inverse Clustering"   Young Stars Program

03/24/06: 10:30 AM, webcast
11th Floor Large Conference Room
Host: Patrick Pantel and Jafar Adibi, schedule

Abstract: We propose the new problem of "Inverse Clustering". Traditionally, in clustering, one is given the locations of points and needs to group them together in clusters to minimize certain natural objectives, such as average or maximum distance within clusters. In Inverse Clustering, one is given a proposed clustering as well as partial information about the (otherwise unknown) point locations; the goal is to find the best locations for the points under these constraints. We show that Inverse Clustering is a natural framework in which to cast classification problems based on data that is labeled by a union of labels. Examples include documents with sections on different topics, or words with multiple meanings. The objective of prying apart the feature vectors corresponding to the multiple labels can then be cleanly formalized as an Inverse Clustering problem. We study algorithms for, and the complexity of, different variants of the Inverse Clustering problem. We distinguish between the "strict" version, in which not only must the inferred point locations be optimal for the given clustering, but also there should not be any better clustering for those locations, and the "relaxed" version, in which the points are only required to be optimal for the clustering. For the former, we show that Inverse Clustering is Sigma_2^p-complete in the polynomial hierarchy, while most natural objectives for Relaxed Inverse Clustering can be solved optimally in polynomial time using linear programming, multivariate calculus, or convex programming. Finally, we show experiments on synthetic data demonstrating that Inverse Clustering algorithms are able to reconstruct the locations of points with fairly high accuracy, and we relate the error to parameters of the input instances.

About David Kempe: David Kempe received his Ph.D. from Cornell University in 2003, and has been on the faculty in the computer science department at USC since the Fall of 2004. His primary research interests are in computer science theory and the design and analysis of algorithms, with a particular emphasis on social networks, distributed network algorithms, and game theoretic and pricing questions. He is a recipient of the NSF CAREER award.


Last updated: Mon Jun 19 17:44:06 2006

 

 

 

 

 
USC Home Page ISI Home Page