ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Time shift invariant speech recognition

Sankar Basu, Abraham Ittycheriah, St├ęphane Maes

When shifting by a few samples a speech signal, we have observed significant variations of the feature vectors produced by the acoustic front-end. Furthermore, these utterances when decoded with a continuous speech recognition system leads to dramatically different word error rates. This paper analyzes the phenomena and illustrates the well known result that classical acoustic front end processors including spectrum and cepstra based techniques suffer from time-shift. After describing the effect of sample sized shifts on the spectral estimates of the signal, we propose several techniques which take advantage of shift variations to multiply the amount of training that speech utterances can provide. Eventually, we illustrate how it is possible to slightly modify the acoustic front-end to render the recognizer invariant to small shifts.

doi: 10.21437/ICSLP.1998-656

Cite as: Basu, S., Ittycheriah, A., Maes, S. (1998) Time shift invariant speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0983, doi: 10.21437/ICSLP.1998-656

  author={Sankar Basu and Abraham Ittycheriah and St├ęphane Maes},
  title={{Time shift invariant speech recognition}},
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0983},