SAPA-SCALE Conference 2012
Portland, OR, USA
The most widely used acoustic feature extraction methods of current automatic speech recognition (ASR) systems are based on the assumption of stationarity. In this paper we extensively evaluate a recently introduced filter stable, non-stationary signal processing method, which relies on an adaptive parttone decomposition of voiced speech to obtain alternative feature vectors for ASR. The non-stationary filterbank allows for more noise robust amplitude based features by suppressing the between-harmonics regions. Furthermore, by adapting the center filter frequencies to the underlying acoustic modes, it is possible to obtain useful phase features which can be interpreted in terms of the non-stationary dynamics within the vocal tract. The features are evaluated on different tasks ranging from vowel classification up to large vocabulary continuous speech recognition.
Index Terms: non-stationary, adaptive filter, noise robust, phase features, ASR
Bibliographic reference. Tüske, Zoltán / Drepper, Friedhelm R. / Schlüter, Ralf (2012): "Non-stationary signal processing and its application in speech recognition", In SAPA-SCALE-2012, 34-39.