Publications
Recognition of multi-oriented, multi-sized, and curved text
Abstract
Text recognition is difficult from documents that contain multi-oriented, curved text lines of various character sizes. This is because layout analysis techniques, which most optical character recognition (OCR) approaches rely on, do not work well on unstructured documents with non-homogeneous text. Previous work on recognizing non-homogeneous text typically handles specific cases, such as horizontal and/or straight text lines and single-sized characters. In this paper, we present a general text recognition technique to handle non-homogeneous text by exploiting dynamic character grouping criteria based on the character sizes and maximum desired string curvature. This technique can be easily integrated with classic OCR approaches to recognize non-homogeneous text. In our experiments, we compared our approach to a commercial OCR product using a variety of raster maps that contain multi-oriented …
- Date
- September 18, 2011
- Authors
- Yao-Yi Chiang, Craig A Knoblock
- Conference
- 2011 International Conference on Document Analysis and Recognition
- Pages
- 1399-1403
- Publisher
- IEEE