Publications

Spectro-temporal directional derivative features for automatic speech recognition.

Abstract

We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speech, even in high noise conditions. Therefore, spectro-temporal features extracted from the wavelet transformation of the spectrogram, offer additional noise robustness to important signal processing tasks, such as voice activity detection and speech recognition. In this paper, we explore the use of the steerable pyramid, a directional wavelet transform that is common in image processing, to derive a spectro-temporal feature representation of speech that can serve as an alternative to cepstral derivatives and Gabor filterbank …

Date
October 13, 2025
Authors
James Gibson, Maarten Van Segbroeck, Antonio Ortega, Panayiotis G Georgiou, Shrikanth S Narayanan
Conference
INTERSPEECH
Pages
872-875