We compare the performance of Directional Derivatives features for automatic speech recognition when extracted from different time-frequency representations. Specifically, we use the short-time Fourier transform, Mel-frequency, and Gammatone spectrograms as a base from which we extract spectro-temporal modulations. We then assess the noise robustness of each representation with varied number of frequency bins and dynamic range compression schemes for both word and phone recognition. We find that the choice of dynamic range compression approach has the most significant impact on recognition performance. Whereas, the performance differences between perceptually motivated filter-banks are minimal in the proposed framework. Furthermore, this work presents significant gains in speech recognition accuracy for low SNRs over MFCCs, GFCCs, and Directional Derivatives extracted from the log-Mel spectrogram.
Bibliographic reference. Gibson, James / Segbroeck, Maarten Van / Narayanan, Shrikanth S. (2014): "Comparing time-frequency representations for directional derivative features", In INTERSPEECH-2014, 612-615.