Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

Vivek Tyagi (1), Christian Wellekens (1), Hervé Bourlard (2)

(1) Institut Eurécom, Sophia Antipolis, France; (2) IDIAP Research Institute, Switzerland

A fixed scale (typically 25ms) short time spectral analysis of speech signals, which are inherently multi-scale in nature (typically vowels last for 40-80ms while stops last for 10-20ms), is clearly sub-optimal for time-frequency resolution. Based on the usual assumption that the speech signal can be modeled by a time-varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OGI Numbers95 database, show that the proposed variable-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum [1] as well as those based on fixed scale spectral analysis.

Full Paper

Bibliographic reference.  Tyagi, Vivek / Wellekens, Christian / Bourlard, Hervé (2005): "On variable-scale piecewise stationary spectral analysis of speech signals for ASR", In INTERSPEECH-2005, 209-212.