ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

On variable-scale piecewise stationary spectral analysis of speech signals for ASR

Vivek Tyagi, Christian Wellekens, Hervé Bourlard

A fixed scale (typically 25ms) short time spectral analysis of speech signals, which are inherently multi-scale in nature (typically vowels last for 40-80ms while stops last for 10-20ms), is clearly sub-optimal for time-frequency resolution. Based on the usual assumption that the speech signal can be modeled by a time-varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OGI Numbers95 database, show that the proposed variable-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum [1] as well as those based on fixed scale spectral analysis.


doi: 10.21437/Interspeech.2005-110

Cite as: Tyagi, V., Wellekens, C., Bourlard, H. (2005) On variable-scale piecewise stationary spectral analysis of speech signals for ASR. Proc. Interspeech 2005, 209-212, doi: 10.21437/Interspeech.2005-110

@inproceedings{tyagi05_interspeech,
  author={Vivek Tyagi and Christian Wellekens and Hervé Bourlard},
  title={{On variable-scale piecewise stationary spectral analysis of speech signals for ASR}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={209--212},
  doi={10.21437/Interspeech.2005-110}
}