ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Spectro-temporal directional derivative features for automatic speech recognition

James Gibson, Maarten Van Segbroeck, Antonio Ortega, Panayiotis G. Georgiou, Shrikanth Narayanan

We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Mel-spectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speech, even in high noise conditions. Therefore, spectro-temporal features extracted from the wavelet transformation of the spectrogram, offer additional noise robustness to important signal processing tasks, such as voice activity detection and speech recognition. In this paper, we explore the use of the steerable pyramid, a directional wavelet transform that is common in image processing, to derive a spectro-temporal feature representation of speech that can serve as an alternative to cepstral derivatives and Gabor filter-bank features. We discuss their application for the task of robust automatic speech recognition. Experiments conducted on the Aurora-2 database demonstrate their competitive robustness to other state-of-the-art speech features, especially in low signal-to-noise ratio conditions.


doi: 10.21437/Interspeech.2013-258

Cite as: Gibson, J., Segbroeck, M.V., Ortega, A., Georgiou, P.G., Narayanan, S. (2013) Spectro-temporal directional derivative features for automatic speech recognition. Proc. Interspeech 2013, 872-875, doi: 10.21437/Interspeech.2013-258

@inproceedings{gibson13_interspeech,
  author={James Gibson and Maarten Van Segbroeck and Antonio Ortega and Panayiotis G. Georgiou and Shrikanth Narayanan},
  title={{Spectro-temporal directional derivative features for automatic speech recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={872--875},
  doi={10.21437/Interspeech.2013-258}
}