Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling

Abstract

In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long Short-Term Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72%, 65%, and 55% for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.

Date: December 19, 2025
Authors: Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, Shrikanth Narayanan

View Paper

Information Sciences Institute

Publications

Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling

Abstract