University of Southern California

Seminar: Unsupervised Part of Speech Induction Using Paradigmatic Representations of Word Context

When:
Thursday, January 31, 2013, 11:00 am - 12:00 pm
Where:
6th Floor Conf. Room (#689)
Speaker:
Mehmet Ali Yatbaz
Description:

Abstract:  We investigate paradigmatic representations of word context in the domain of unsupervised part of speech induction. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We demonstrate paradigmatic representations within two frameworks: (1) context clustering and (2) co-occurrence modeling. In context clustering we cluster word contexts based on the potential substitutes and they reveal a grouping that largely match the traditional part of speech boundaries. In co-occurrence modeling we construct a Euclidean embedding that models the co-occurrence of

word types and their contexts. Clustering the points that correspond to word types in the Euclidean embedding gives state-of-the-art results in unsupervised part of speech induction, including 80% many-to-one accuracy on the Penn Treebank and improvements on 16 out of 19 corpora in 15 languages.

Bio:  Mehmet Ali Yatbaz is a PhD candidate in Deniz Yuret's AI Lab at Koç University, Turkey.  His research is on unsupervised word sense  disambiguation, unsupervised morphological disambiguation and part of speech induction.  He is also a member of Bologna Translation Service European Union Project and responsible for the collecting, extracting and cleaning of the parallel text corpora from publicly available web sites as a part of the Turkish - English machine translation system.

Home Page:

http://home.ku.edu.tr/~myatbaz

View Event Calendar »