Publications

Theoretical analysis of diversity in an ensemble of automatic speech recognition systems

Abstract

Diversity or complementarity of automatic speech recognition (ASR) systems is crucial for achieving a reduction in word error rate (WER) upon fusion using the ROVER algorithm. We present a theoretical proof explaining this often-observed link between ASR system diversity and ROVER performance. This is in contrast to many previous works that have only presented empirical evidence for this link or have focused on designing diverse ASR systems using intuitive algorithmic modifications. We prove that the WER of the ROVER output approximately decomposes into a difference of the average WER of the individual ASR systems and the average WER of the ASR systems with respect to the ROVER output. We refer to the latter quantity as the diversity of the ASR system ensemble because it measures the spread of the ASR hypotheses about the ROVER hypothesis. This result explains the trade-off between the …

Date
2014
Authors
Kartik Audhkhasi, Andreas M Zavou, Panayiotis G Georgiou, Shrikanth S Narayanan
Journal
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume
22
Issue
3
Pages
711-726
Publisher
IEEE