8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Neural "Spike Rate Spectrum" as a Noise Robust, Speaker Invariant Feature for Automatic Speech Recognition

T. V. Sreenivas, G. V. Kiran, A. G. Krishna

Indian Institute of Science, India

A new feature set for ASR called Rate-Spectrum(RS) is proposed. RS is a spectral representation obtained using a computational auditory model. The feature is noise-robust and considerably speaker invariant. RS matches the smoothed log spectrum both in shape and dynamic range variation. DCT is used to reduce dimensionality. Comparison of the proposed features with MFCC is done using an Isolated word recognition experiment on the TI Digits database, for clean and noisy speech cases. For speakers seen during training, RS and RS-DCT outperform MFCC in noisy case while matching that of MFCC in the clean case. For unseen speakers, RS does better than MFCC in the clean case, RS-DCT outperforms MFCC in the noisy case. We have thus shown that the proposed feature for ASR is noise robust and speaker invariant.

Full Paper

Bibliographic reference.  Sreenivas, T. V. / Kiran, G. V. / Krishna, A. G. (2004): "Neural "spike rate spectrum" as a noise robust, speaker invariant feature for automatic speech recognition", In INTERSPEECH-2004, 929-932.