Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Abstract

Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large amounts of open-source datasets and enhancement of computational resources. However, a shortcoming of these methods is that they often fail to generalize well to tasks from real life scenarios, due to domain mismatch. One such task is foreground speech detection from wearable audio devices. Several interfering factors such as dynamically varying environmental conditions, including background speakers, TV, or radio audio, render foreground speech detection to be a challenging task. Moreover, obtaining precise moment-to-moment annotations of audio streams for analysis and model training is also time-consuming and costly. In this work, we use multiple instance learning (MIL) to facilitate …

Date: 2021
Authors: Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F Danvers, Angelina J Polsinelli, Suzanne A Moseley, David A Sbarra, Matthias R Mehl, Shrikanth Narayanan
Journal: EURASIP Journal on Audio, Speech, and Music Processing
Volume: 2021
Issue: 1
Pages: 7
Publisher: Springer International Publishing

View Paper

Information Sciences Institute

Publications

Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Abstract