Publications

Text-independent voice conversion based on unit selection

Abstract

So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and large speaker. Since several applications (e.g. speech-to-speech translation or dubbing) require text-independent training, over the last two years, training techniques that use non-parallel data were proposed In this paper, we present a new approach that applies unit selection to find corresponding time frames in source and target speech. By means of a subjective experiment it is shown that this technique achieves the same performance as the conventional text-dependent training

Date
May 14, 2006
Authors
David Sundermann, Harald Hoge, Antonio Bonafonte, Hermann Ney, Alan Black, Shri Narayanan
Conference
2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
Volume
1
Pages
I-I
Publisher
IEEE