Publications

Emotion to emotion speech conversion in phoneme level

Abstract

Having an ability to synthesize emotional speech can make human–machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion‐to‐emotion speech conversion. The prosody modification is performed with the TD‐PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC‐based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not …

Date
2004
Authors
Murtaza Bulut, Serdar Yildirim, Carlos Busso, Chul Min Lee, Ebrahim Kazemzadeh, Sungbok Lee, Shrikanth Narayanan
Journal
The Journal of the Acoustical Society of America
Volume
116
Issue
4_Supplement
Pages
2481-2481
Publisher
Acoustical Society of America