Liang Huang's Software
The software listed here is all open source and free for academic use.
For non-academic use please contact the author.
[parser] [tagger] [forest reranker]
[cubit decoder] [sparse vector for Python]
Author: Liang Huang
This is the linear-time shift-reduce dependency parser described in
Huang and Sagae (2010),
which achieves dynamic programming via graph-structured stack (GSS),
with state-of-the-art (unlabeled) dependency parsing accuracies
on English (92.1%) and Chinese (85.2% on CTB5).
It comes with the following features:
- linear-time compleixty, and extremely fast parsing speed (~0.03 seconds per sentence)
- can output k-best trees and compute k-best oracles
- can output packed forests and compute forest oracles
- written in Python; requires Python version >=2.6, but <3
- uses very little memory both in training and decoding
- training by (averaged) Perceptron with early-updating;
also supports parallel perceptron training (McDonald et al., 2010) on multicore
via Python's multiprocessing module
- supports user-defined feature templates (by automatically generating code to handle new feature combinations) which makes feature engineering a lot easier
- comes with my sparse vector module for Python
and the following limitations/drawbacks:
- currently requires POS-tagged text as input, as is the case with other discriminatively-trained parsers like Malt and MST
- only does unlabeled dependency parsing
- the model contains huge number of features and it takes a while to load
- Perceptron training takes many (~30) iterations to converge (due to early-updating),
which is slow in single-CPU mode.
[Download]
Authors: Yang Guo and Liang Huang
This is a simple discriminatively trained trigram part-of-speech tagger
developed by MS student Yang Guo under my direction,
with state-of-the-art accuracies on English (97%)
and Chinese (92% on CTB2, 93% on CTB5). It has the following features:
- Morphology features: suffixes for English and characters for Chinese
- Very fast (0.01 seconds per sentence) and slim in memory
- Supports both single-CPU and multicore Perceptron training.
Unlike the parser, here training is very fast (less than one hour)
because it converges fast with normal updating (5 iterations for single-CPU; 40 iterations for 8-12 cores).
[Download]
Liang Huang
Last modified: Thu Aug 11 16:48:16 PDT 2011