A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs

Abstract

Incorporating multimodal information and temporal context from speakers during an emotional dialog can contribute to improving performance of automatic emotion recognition systems. Motivated by these issues, we propose a hierarchical framework which models emotional evolution within and between emotional utterances, i.e., at the utterance and dialog level respectively. Our approach can incorporate a variety of generative or discriminative classifiers at each level and provides flexibility and extensibility in terms of multimodal fusion; facial, vocal, head and hand movement cues can be included and fused according to the modality and the emotion classification task. Our results using the multimodal, multi-speaker IEMOCAP database indicate that this framework is well-suited for cases where emotions are expressed multimodally and in context, as in many real-life situations.

Date: 2012
Authors: Angeliki Metallinou, Athanasios Katsamanis, Shrikanth Narayanan
Conference: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Pages: 2401-2404
Publisher: IEEE

View Paper

Information Sciences Institute

Publications

A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs

Abstract