15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Comparing Time-Frequency Representations for Directional Derivative Features

James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan

University of Southern California, USA

We compare the performance of Directional Derivatives features for automatic speech recognition when extracted from different time-frequency representations. Specifically, we use the short-time Fourier transform, Mel-frequency, and Gammatone spectrograms as a base from which we extract spectro-temporal modulations. We then assess the noise robustness of each representation with varied number of frequency bins and dynamic range compression schemes for both word and phone recognition. We find that the choice of dynamic range compression approach has the most significant impact on recognition performance. Whereas, the performance differences between perceptually motivated filter-banks are minimal in the proposed framework. Furthermore, this work presents significant gains in speech recognition accuracy for low SNRs over MFCCs, GFCCs, and Directional Derivatives extracted from the log-Mel spectrogram.

Full Paper

Bibliographic reference.  Gibson, James / Segbroeck, Maarten Van / Narayanan, Shrikanth S. (2014): "Comparing time-frequency representations for directional derivative features", In INTERSPEECH-2014, 612-615.