Publications

Topic specific language models built from large numbers of documents

Abstract

BACKGROUND
Natural language processing (NLP) systems, such as speech recognition, machine translation, or other text to text applications, typically rely on language models to allow a machine to recognize speech. The performance of these sys tems can be improved by customizing the model for a specific domain and/or application. A typical way of forming Such a model is to base the model on text resources. For example, a model for a specific domain may be based on text resources that are specific to that domain. Sometimes, text for a target domain might be available from an institution, that maintains a repository of texts. Such as NIST or LDC. Other times, the data is simply collected manually.
Manual collection of data may be very difficult, and may add to system turnaround time and cost. Moreover, the amount of available data for a specific domain may be quite limited. In order to limit the effects of minimal …

Date
June 15, 2010
Authors
A Sethy, P Georgiou, S Narayanan
Inventors
Abhinav Sethy, Panayiotis Georgiou, Shrikanth Narayanan
Patent_office
US
Patent_number
7739286
Application_number
11384226