USC-EMO-MRI corpus: An emotional speech production database recorded by real-time magnetic resonance imaging

Abstract

This paper introduces a new multimodal database of emotional speech production recorded using real-time magnetic resonance imaging. This corpus contains magnetic resonance (MR) videos of five male and five female speakers and the results of evaluation for the emotion quality of each sentence-level utterance, performed by at least 10 listeners. Both speakers and listeners are professional actors/actresses. The MR videos contain MR image sequences of the entire upper airway in the mid-sagittal plane and synchronized speech audios after noise cancellation. The stimuli comprises the “Grandfather “passage and seven sentences. A single repetition of the passage and five repetitions of the sentences were recorded five times, each time with a different acted emotion. The four target emotions are anger, happiness, sadness and neutrality (no emotion). Additionally one repetition of the Grandfather passage was recorded in a neutral emotion and fast speaking rate, as opposed to a natural speaking rate for the rest of the recordings. This paper also includes a preliminary analysis of the MR images to illustrate how vocal tract configurations, measured in terms of distances between inner and outer vocal-tract walls along the tract, vary as a function of emotion.

Date: January 1, 1970
Authors: Jangwon Kim, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Sungbok Lee, Shrikanth Narayanan
Journal: International Seminar on Speech Production (ISSP), Cologne, Germany
Volume: 226

View Paper

Information Sciences Institute

Publications

USC-EMO-MRI corpus: An emotional speech production database recorded by real-time magnetic resonance imaging

Abstract