Publications
Character duration modeling for speed improvements in the BBN Byblos OCR system
Abstract
In this paper, we describe a recent enhancement to our HMM-based OCR system that results in a significant increase in the speed of the system without any impact on recognition accuracy. Recognition speed is, in part, a function of the number of distinct HMMs that constitute the model set. As a result, the recognition speed is much slower for ideographic scripts, such as Chinese and Japanese which contain thousands of glyphs, than for alphabetic scripts such as Latin and Arabic. In our current OCR system, methods like sub-character modeling and Gaussian shortlists are used to reduce the processing time. In this paper, we describe a simple character-based duration modeling technique that puts a duration constraint on the number of frames for which a character can stay active. Character durations were obtained from automatically labeled training data and a probability mass function (histogram) was used to …
- Date
- August 31, 2005
- Authors
- Premkumar Natarajan, Ram Sundaram, Rohit Prasad, Ehry MacRostie
- Conference
- Eighth International Conference on Document Analysis and Recognition (ICDAR'05)
- Pages
- 1136-1140
- Publisher
- IEEE