ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Comparison of spectral analysis methods for automatic speech recognition

Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian

In this paper, we evaluate the front-end of Automatic Speech Recognition (ASR) systems, with respect to different types of spectral processing methods that are extensively used. Experimentally, we show that direct use of FFT spectral values is just as effective as using either Mel or Gammatone filter banks, as an intermediate processing stage, if the cosine basis vectors used for dimensionality reduction are appropriately modified. Furthermore it is shown that trajectory features computed over intervals of approximately 300ms are considerably more effective, in terms of ASR accuracy, than are delta and delta-delta terms often used for ASR. Although there is no major performance disadvantage if a filter bank is used, simplicity of analysis is a reason to eliminate this step in speech processing. The experimental results which confirm the above assertions are based on the TIMIT phonetically labeled database. The assertions hold for both clean and noisy speech.


doi: 10.21437/Interspeech.2013-742

Cite as: Parinam, V.N., Vootkuri, C., Zahorian, S.A. (2013) Comparison of spectral analysis methods for automatic speech recognition. Proc. Interspeech 2013, 3356-3360, doi: 10.21437/Interspeech.2013-742

@inproceedings{parinam13_interspeech,
  author={Venkata Neelima Parinam and Chandra Vootkuri and Stephen A. Zahorian},
  title={{Comparison of spectral analysis methods for automatic speech recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3356--3360},
  doi={10.21437/Interspeech.2013-742}
}