5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Techniques For Capturing Temporal Variations In Speech Signals With Fixed-Rate Processing

Satya Dharanipragada (1), Ramesh A. Gopinath (1), Bhaskar D. Rao (2)

(1) IBM TJ Watson Research Center, USA
(2) University of California, San Diego, USA

Fixed-rate feature extraction which is used in most current speech recognizers is equivalent to sampling the feature trajectories at a uniform rate. Often this sampling rate is well below the Nyquist rate and thus leads to distortions in the sampled feature stream due to aliasing. In this paper we explore various techniques, ranging from simple cepstral and spectral smoothing to filtering and data-driven dimensionality expansion using Linear Discriminant Analysis (LDA), to counter aliasing and the variable rate nature of information in speech signals. Smoothing in the spectral domain results in a reduction in the variance of the short term spectral estimates which directly translates to reduction in the variances of the Gaussians in the acoustic models. With these techniques we obtain modest improvements, both in word error rate and robustness to noise, on large vocabulary speech recognition tasks.

Full Paper

Bibliographic reference.  Dharanipragada, Satya / Gopinath, Ramesh A. / Rao, Bhaskar D. (1998): "Techniques for capturing temporal variations in speech signals with fixed-rate processing", In ICSLP-1998, paper 0590.