ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Wavelet-based energy binning cepstral features for automatic speech recognition

Sankar Basu, St├ęphane Maes

Speech production models, coding methods as well as text to speech technology often lead to the introduction of modulation models to represent speech signals with primary components which are amplitude-and-phase-modulated sine functions. Parallelisms between properties of the wavelet transform of primary components and algorithmic representations of speech signals derived from auditory nerve models like the EIH lead to the introduction of synchrosqueezing measures. On the other hand, in automatic speech (and speaker) recognition, cepstral feature have imposed themselves quasi-universally as acoustic characteristic of speech utterances. This paper analyses cepstral representation in the context of the synchrosqueezed representation - wastrum. It discusses energy accumulation derived wastra as opposed to classical MEL and LPC derived cepstra. In the former method the primary components and formants play a primary role. Recognition results are presented on the Wall Street Journal database using IBM continuous decoder.


doi: 10.21437/ICSLP.1998-563

Cite as: Basu, S., Maes, S. (1998) Wavelet-based energy binning cepstral features for automatic speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0982, doi: 10.21437/ICSLP.1998-563

@inproceedings{basu98_icslp,
  author={Sankar Basu and St├ęphane Maes},
  title={{Wavelet-based energy binning cepstral features for automatic speech recognition}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0982},
  doi={10.21437/ICSLP.1998-563}
}