Emotion to emotion speech conversion in phoneme level

Abstract

Having an ability to synthesize emotional speech can make human–machine interaction more natural in spoken dialogue management. This study investigates the effectiveness of prosodic and spectral modification in phoneme level on emotion‐to‐emotion speech conversion. The prosody modification is performed with the TD‐PSOLA algorithm (Moulines and Charpentier, 1990). We also transform the spectral envelopes of source phonemes to match those of target phonemes using LPC‐based spectral transformation approach (Kain, 2001). Prosodic speech parameters (F0, duration, and energy) for target phonemes are estimated from the statistics obtained from the analysis of an emotional speech database of happy, angry, sad, and neutral utterances collected from actors. Listening experiments conducted with native American English speakers indicate that the modification of prosody only or spectrum only is not …

Date: 2004
Authors: Murtaza Bulut, Serdar Yildirim, Carlos Busso, Chul Min Lee, Ebrahim Kazemzadeh, Sungbok Lee, Shrikanth Narayanan
Journal: The Journal of the Acoustical Society of America
Volume: 116
Issue: 4_Supplement
Pages: 2481-2481
Publisher: Acoustical Society of America

View Paper

Information Sciences Institute

Publications

Emotion to emotion speech conversion in phoneme level

Abstract