Publications

The BBN document analysis service: a platform for multilingual document translation

Abstract

In this paper, we introduce a new operational platform for end-to-end document image analysis, recognition, and machine translation. The Raytheon BBN Document Analysis Service (BBN DAS) performs the following operations on scanned machine-print document images: (1) image pre-processing and segmentation to identify homogenous zones of text, (2) text recognition to convert the text zones into electronic text, (3) machine translation for converting the text from the native language of the document into English, and (4) document archiving and indexing for effective content-based search. BBN DAS uses a service-oriented architecture (SOA), which offers modularity and scalability for operation on hardware configurations ranging from a laptop to distributed multi-node server environments. This paper describes the platform architecture, the process of configuring it for Arabic newsprint documents and resulting …

Date
June 9, 2010
Authors
Ehry MacRostie, Rohit Prasad, Stephen Rawls, Matin Kamali, Huaigu Cao, Krishna Subramanian, Prem Natarajan
Book
Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Pages
447-454