Text-independent voice conversion based on unit selection

Abstract

So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and large speaker. Since several applications (e.g. speech-to-speech translation or dubbing) require text-independent training, over the last two years, training techniques that use non-parallel data were proposed In this paper, we present a new approach that applies unit selection to find corresponding time frames in source and target speech. By means of a subjective experiment it is shown that this technique achieves the same performance as the conventional text-dependent training

Date: May 14, 2006
Authors: David Sundermann, Harald Hoge, Antonio Bonafonte, Hermann Ney, Alan Black, Shri Narayanan
Conference: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
Volume: 1
Pages: I-I
Publisher: IEEE

View Paper

Information Sciences Institute

Publications

Text-independent voice conversion based on unit selection

Abstract