SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Non-Stationary Signal Processing and its Application in Speech Recognition

Zoltán Tüske, Friedhelm R. Drepper, Ralf Schlüter

Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, Aachen, Germany

The most widely used acoustic feature extraction methods of current automatic speech recognition (ASR) systems are based on the assumption of stationarity. In this paper we extensively evaluate a recently introduced filter stable, non-stationary signal processing method, which relies on an adaptive parttone decomposition of voiced speech to obtain alternative feature vectors for ASR. The non-stationary filterbank allows for more noise robust amplitude based features by suppressing the between-harmonics regions. Furthermore, by adapting the center filter frequencies to the underlying acoustic modes, it is possible to obtain useful phase features which can be interpreted in terms of the non-stationary dynamics within the vocal tract. The features are evaluated on different tasks ranging from vowel classification up to large vocabulary continuous speech recognition.

Index Terms: non-stationary, adaptive filter, noise robust, phase features, ASR

Full Paper

Bibliographic reference.  Tüske, Zoltán / Drepper, Friedhelm R. / Schlüter, Ralf (2012): "Non-stationary signal processing and its application in speech recognition", In SAPA-SCALE-2012, 34-39.