Publications

Unsupervised data processing for classifier-based speech translator

Abstract

Concept classification has been used as a translation method and has shown notable benefits within the suite of speech-to-speech translation applications. However, the main bottleneck in achieving an acceptable performance with such classifiers is the cumbersome task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation needs a distance measure to compare sentences based on the concept they convey. Here, we introduce a new method of sentence comparison that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based statistical machine translation system are used to compare the concepts of the source sentences. Three clustering methods are adapted to support the concept-base distance. These methods are applied to prepare groups of paraphrases and use them …

Date
2013
Authors
Emil Ettelaie, Panayiotis G Georgiou, Shrikanth S Narayanan
Journal
Computer Speech & Language
Volume
27
Issue
2
Pages
438-454
Publisher
Academic Press