Publications

Robust speaker clustering strategies to data source variation for improved speaker diarization

Abstract

Agglomerative hierarchical clustering (AHC) has been widely used in speaker diarization systems to classify speech segments in a given data source by speaker identity, but is known to be not robust to data source variation. In this paper, we identify one of the key potential sources of this variability that negatively affects clustering error rate (CER), namely short speech segments, and propose three solutions to tackle this issue. Through experiments on various meeting conversation excerpts, the proposed methods are shown to outperform simple AHC in terms of relative CER improvements in the range of 17-32%.

Date
2007
Authors
Kyu J Han, Samuel Kim, Shrikanth S Narayanan
Conference
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
Pages
262-267
Publisher
IEEE