EUROSPEECH 2001 Scandinavia
In this paper, methodologies for effective speech recognition are considered along with evaluations of an NRL speech in noise corpus entitled SPINE. When speech is produced in adverse conditions that include high levels of noise, workload task stress, and Lombard effect, new challenges arise concerning how to best improve recognition performance. Here, we consider tradeoffs in (i) robust features, (ii) frontend noise suppression, (iii) model adaptation, and (iv) training and testing in the same conditions. The type of noise and recording conditions can significantly impact the type of signal processing and speech modeling methods that would be most effective in achieving robust speech recognition. We considered alternative frequency scales (M-MFCC, ExpoLog), feature processing (CMN, VCMN, LP-vs-FFT MFCCs), model adaptation (PMC), and combinations of gender dependent with gender independent models. For the purposes of achieving effective speech recognition performance, computational speed and availability of adaptation data greatly impacts final recognition performance. In particular, while reliable algorithm formulations for addressing specific types of distortion can improve recognition rates, these algorithms cannot reach their full potential without proper front-end algorithm data processing to direct compensation. While parallel banks of speech recognizers can improve recognition performance, their significant computational requirements can render the recognizer useless in actual speech applications.
Bibliographic reference. Hansen, John H. L. / Sarikaya, Ruhi / Yapanel, Umit / Pellom, Bryan (2001): "Robust speech recognition in noise: an evaluation using the SPINE corpus", In EUROSPEECH-2001, 905-911.