Use of model transformations for distributed speech recognition

Abstract

Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech–either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9%, 63.3% and 56.3% for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the …

Date: January 1, 1970
Authors: Naveen Srinivasamurthy, Shrikanth Narayanan, Antonio Ortega
Journal: ISCA Workshop on Adaptation Methods for Speech recognition

View Paper

Information Sciences Institute

Publications

Use of model transformations for distributed speech recognition

Abstract