Omid Madani
Yahoo! Research Labs
donotspam.Omid.Madani@overture.com
http://research.yahoo.com/staff/algorithms/madani.xml
"Co-Validation: Using Model Disagreement to Validate
Classification Algorithms"
10/22/04: 10:30 AM
11th Floor Large Conference Room
Host: Patrick Pantel, schedule
Abstract: In many applications of machine learning, labeled data is
scarce whileunlabeled data is plentiful. In these settings, unlabeled
data can beutilized in a number of ways to help address the shortage
of labeleddata. For example, in active learning, unlabeled instances
areselectively sampled and labeled in order to quickly improve
theaccuracy of the learning algorithm while lowering labeling
costs.Other related methods include techniques for transduction
andsemi-supervised induction. In this talk, I describe a new way
of utilizing unlabeled data. Inthe context of binary classification,
we define disagreement as ameasure of how often two
independently-trained models differ in theirclassification of
unlabeled data. The disagreement rate is areflection of learning
algorithm stability, model complexity, andproblem difficulty, and
enjoys a number of properties. For example,we show that disagreement
yields a lower bound on the prediction(generalization)error, and an
upper bound on the ``variance of prediction error'', where variance is
measured across training sets. I will report on ourempirical
experiments on a number of datasets using disagreement forerror
estimation and model selection. We call the general
procedureco-validation, since the two independently-trained models
areeffectively used to validate one another. The procedure is
especiallyeffective in active learning settings, where training sets
are notdrawn at random and cross validation often greatly
overestimateserror. We believe that variants of co-validation may be
of greatpractical use when unlabeled data is plentiful. Joint work with David Pennock and Gary Flake. To appear in NIPS04.
About Omid Madani: Omid Madani earned a doctorate in computer science at the
Universityof Washington, and attended the University of Alberta as
apost-doctoral fellow, where he won the Alberta
IngenuityAssociateship. Omid is interested in many areas of
artificialintelligence, including machine learning (utilizing
unlabeled data, incorporating prior knowledge,..) and dynamic decision
making underuncertainty (algorithms design/analysis for MDPs). He is
a senior research scientist at Yahoo! Research Labs, applying his research tochallenging and exciting problems in domains such as informationretrieval and personalization.
Last updated: Mon Jun 19 17:44:06 2006
 |