Classification of clean and noisy bilingual movie audio for speech-to-speech translation corpora design

Abstract

Identifying suitable sources of bilingual audio and text data is a crucial part of statistical Speech to Speech (S2S) research and development. Movies, often dubbed in other languages, offer a good source for this purpose; but not all data are directly usable because of noise and other audio condition differences. Hence, automatically selecting the bilingual audio data that are suitable for analysis, and training S2S systems for specific environments becomes crucial. In this work, we extract bilingual speech segments from movies and aim at classifying segments as clean speech or speech with background noise (i.e. music, babble noise etc.). We examine various features in solving this problem and our best performing method delivers accuracy up to 87% in discriminating clean and noisy speech in bilingual data.

Date: 2014
Authors: Andreas Tsiartas, Prasanta Kumar Ghosh, Panayiotis Georgiou, Shrikanth Narayanan
Conference: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Pages: 121-125
Publisher: IEEE

View Paper

Information Sciences Institute

Publications

Classification of clean and noisy bilingual movie audio for speech-to-speech translation corpora design

Abstract