University of Southern California

NL Seminar-Large Scale Syntactic Language Modeling with Treelets

When:
Friday, February 17, 2012, 03:00 pm - 4:00 pm
Where:
11th Floor Conf. Room (#1135)
Speaker:
Adam Pauls (UC Berkeley)
Description:

Abstract: We propose a simple generative syntactic language model that conditions on overlapping tree contexts in the same way that n-gram language models condition on overlapping sentence context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a mater of hours. We evaluate on a range of grammaticality tasks, and find that we consistently outperform n-gram models and other generative baselines, and even compete with state-of-the-art discriminative models hand-designed for each task, despite training on positive data alone. We also show some improvements in preliminary machine translation experiments.

View Event Calendar »