ABSTRACT:
Many applications in natural language processing are made possible due to a very powerful induction method, called the likelihood principle. But one problem with likelihood principle is that sometimes either the induced model is computationally demanding, or it is just plain difficult to find the right likelihood function. In such cases, a new set of tools, such as information-theoretic induction methods, could be one possible answer. In this talk, we propose viewing model induction as a divergence minimization problem. We touch on many aspects regarding this issue and briefly describe our pursuit of a new formalism with which a unified induction theory becomes possible. Two example applications are given as the empirical support.
In the first application, unsupervised word segmentation, we introduce a highly efficient compression-based algorithm that iteratively optimizes an adaptive version of description length. This method has achieved competitive segmentation accuracy against the best methods in the field, such as HDP and adaptor grammars. The second application is static index pruning, a task to reduce the size of inverted index by selectively removing less important entries. We derive two measures based on different notions of information divergence, and show that one measure is comparable to the strong baseline methods and the other has reached state of the art across different retrieval settings.
BIOGRAPHY:
Ruey-Cheng Chen is a PhD candidate in Computer Science and Information Engineering at National Taiwan University, Taiwan. In 2010 he was also working as a visiting scholar at the ICT, USC. He primarily works on natural language processing and information retrieval, where his interests span across diverse topics, including query modeling, retrieval theory, and unsupervised learning. His recent work has focused on information-theoretic learning methods and their applications to natural language processing.
WEBPAGE:
http://turing.csie.ntu.edu.tw/~rueycheng
Webcast Link:
http://webcasterms1.isi.edu/mediasite/Viewer/?peid=f720f7f48b744d45823597230a42ce3f1d