Publications
A multimodal mixture-of-experts model for dynamic emotion prediction in movies
Abstract
This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. The rich emotion content in movies is inherently multimodal, where emotion is evoked through both audio (music, speech) and video modalities. To capture such affective information, we put forth a set of audio and video features that includes several novel features such as, Video Compressibility and Histogram of Facial Area (HFA). We propose a Mixture of Experts (MoE)-based fusion model that dynamically combines information from the audio and video modalities for predicting the emotion evoked in movies. A learning module, based on hard Expectation-Maximization (EM) algorithm, is presented for the MoE model. Experiments on a database of popular movies demonstrate that our MoE-based fusion method outperforms popular fusion strategies (e.g. early and late fusion) in the context of dynamic emotion …
- Date
- March 20, 2016
- Authors
- Ankit Goyal, Naveen Kumar, Tanaya Guha, Shrikanth S Narayanan
- Conference
- 2016 ieee international conference on acoustics, speech and signal processing (icassp)
- Pages
- 2822-2826
- Publisher
- IEEE