A multimodal mixture-of-experts model for dynamic emotion prediction in movies

Abstract

This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. The rich emotion content in movies is inherently multimodal, where emotion is evoked through both audio (music, speech) and video modalities. To capture such affective information, we put forth a set of audio and video features that includes several novel features such as, Video Compressibility and Histogram of Facial Area (HFA). We propose a Mixture of Experts (MoE)-based fusion model that dynamically combines information from the audio and video modalities for predicting the emotion evoked in movies. A learning module, based on hard Expectation-Maximization (EM) algorithm, is presented for the MoE model. Experiments on a database of popular movies demonstrate that our MoE-based fusion method outperforms popular fusion strategies (e.g. early and late fusion) in the context of dynamic emotion …

Date: March 20, 2016
Authors: Ankit Goyal, Naveen Kumar, Tanaya Guha, Shrikanth S Narayanan
Conference: 2016 ieee international conference on acoustics, speech and signal processing (icassp)
Pages: 2822-2826
Publisher: IEEE

View Paper

Information Sciences Institute

Publications

A multimodal mixture-of-experts model for dynamic emotion prediction in movies

Abstract