Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.

Abstract

The paper aims to address the task of speaker verification with single-channel, noisy and far-field speech by learning an embedding or feature representation that is invariant to different acoustic environments. We approach from two different directions. First, we adopt a newly proposed discriminative model that hybridizes Deep Neural Network (DNN) and Total Variability Model (TVM) with the goal of integrating their strengths. DNN helps learning a unique variable length representation of the feature sequence while TVM accumulates them into a fixed dimensional vector. Second, we propose a multitask training scheme with cross entropy and triplet losses in order to obtain good classification performance as well as distinctive speaker embeddings. The multi-task training is applied on both the DNN-TVM model and state-of-the-art x-vector system. The results on the development and evaluation sets of the VOiCES challenge reveal that the proposed multi-task training helps improving models that are solely based on cross entropy, and it works better with DNN-TVM architecture than x-vector for the current task. Moreover, the multi-task models tend to show complementary relationship with cross entropy models, and thus improved performance is observed after fusion.

Date: January 1, 1970
Authors: Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis G Georgiou, Shrikanth Narayanan
Conference: Interspeech
Pages: 2463-2467

View Paper

Information Sciences Institute

Publications

Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.

Abstract