David Kempe
University of Southern California
donotspam.dkempe@usc.edu
http://www-rcf.usc.edu/~dkempe/

"Inverse Clustering" Young Stars Program
03/24/06: 10:30 AM, webcast
11th Floor Large Conference Room
Host: Patrick Pantel and Jafar Adibi, schedule
Abstract: We propose the new problem of "Inverse Clustering".
Traditionally, in clustering, one is given the locations of points and
needs to group them together in clusters to minimize certain natural
objectives, such as average or maximum distance within clusters. In
Inverse Clustering, one is given a proposed clustering as well as
partial information about the (otherwise unknown) point locations; the
goal is to find the best locations for the points under these
constraints.
We show that Inverse Clustering is a natural framework in which to
cast classification problems based on data that is labeled by a union
of labels. Examples include documents with sections on different
topics, or words with multiple meanings. The objective of prying apart
the feature vectors corresponding to the multiple labels can then be
cleanly formalized as an Inverse Clustering problem.
We study algorithms for, and the complexity of, different variants of the
Inverse Clustering problem. We distinguish between the "strict"
version, in which not only must the inferred point locations be
optimal for the given clustering, but also there should not be any
better clustering for those locations, and the "relaxed" version,
in which the points are only required to be optimal for the
clustering. For the former, we show that Inverse Clustering is
Sigma_2^p-complete in the polynomial hierarchy, while most natural
objectives for Relaxed Inverse Clustering can be solved optimally in
polynomial time using linear programming, multivariate calculus, or
convex programming.
Finally, we show experiments on synthetic data demonstrating that
Inverse Clustering algorithms are able to reconstruct the locations of
points with fairly high accuracy, and we relate the error to parameters
of the input instances.
About David Kempe: David Kempe received his Ph.D. from Cornell University in 2003, and
has been on the faculty in the computer science department at USC
since the Fall of 2004. His primary research interests are in computer
science theory and the design and analysis of algorithms, with a
particular emphasis on social networks, distributed network
algorithms, and game theoretic and pricing questions. He is a
recipient of the NSF CAREER award.
Last updated: Mon Jun 19 17:44:06 2006
 |