Publications
Topic specific language models built from large numbers of documents
Abstract
BACKGROUND
Natural language processing (NLP) systems, such as speech recognition, machine translation, or other text to text applications, typically rely on language models to allow a machine to recognize speech. The performance of these sys tems can be improved by customizing the model for a specific domain and/or application. A typical way of forming Such a model is to base the model on text resources. For example, a model for a specific domain may be based on text resources that are specific to that domain. Sometimes, text for a target domain might be available from an institution, that maintains a repository of texts. Such as NIST or LDC. Other times, the data is simply collected manually.
Manual collection of data may be very difficult, and may add to system turnaround time and cost. Moreover, the amount of available data for a specific domain may be quite limited. In order to limit the effects of minimal …
- Date
- June 15, 2010
- Authors
- A Sethy, P Georgiou, S Narayanan
- Inventors
- Abhinav Sethy, Panayiotis Georgiou, Shrikanth Narayanan
- Patent_office
- US
- Patent_number
- 7739286
- Application_number
- 11384226