EUROSPEECH '95
Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

A Unified Approach for Robust Speech Recognition

Pedro J. Moreno, Bhiksha Raj, Richard M. Stern

Department of Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

There are two major structural approaches to robust speech recognition. In the first approach to the problem, compensation is performed by modifying the incoming cepstral stream using ML or MMSE methods to estimate parameters characterizing environmental degradation, from direct frame-by-frame comparisons between speech recorded in high-quality and degraded acoustical environments, or by signal processing techniques such as spectral subtraction. The second approach tackles the problem by modifying the statistics of the internal representation of speech cepstra in the classifier to make them more closely resemble the statistics of degraded speech. This paper attempts to unify these approaches to robust speech recognition by presenting three techniques that share the same basic assumptions and internal structure but differ in whether they modify the incoming speech cepstra or whether they modify the classifier statistics. We present SNR-dependent multi-vaRiate gAussian-based cepsTral normalization (SNR-RATZ) and SNR-based Blind RATZ (SNR-BRATZ), which modify incoming cepstra, along with STAR (STAtistical Re-estimation), which modifies the internal statistics of the classifier. The algorithms were tested using the SPHINX-II speech recognition system on the CENSUS database, a database of strings of letters and numbers to which unknown added and unknown linear filtering was introduced artificially. While all the algorithms showed good performance, STAR was observed to provide lower error rates as SNR decreases than any of the algorithms that modify incoming cepstra.

Full Paper

Bibliographic reference.  Moreno, Pedro J. / Raj, Bhiksha / Stern, Richard M. (1995): "A unified approach for robust speech recognition", In EUROSPEECH-1995, 481-484.