EUROSPEECH 2001 Scandinavia
It is well known that noise reduction schemes are beneficial in ASR to reduce training-test mismatch due to noise. However, a significant mismatch may still remain after noise reduction, especially in the nonspeech portions of the signals. To reduce the impact of this mismatch, two methods for discarding non-speech acoustic vectors at recognition time are investigated: variable frame rate processing and voice activity detection. Experiments are discussed for Aurora 2 and for SpeechDat Car Italian. Results show that both methods are highly effective for SpeechDat Car Italian. However, for Aurora 2, feature vector selection based on voice activity detection hardly gives a benefit, while variable frame rate processing actually lowers recognition accuracy somewhat. Several possible explanations of the different results observed for the two databases are discussed.
Bibliographic reference. Veth, Johan de / Mauuary, Laurent / Noe, Bernhard / Wet, Febe de / Sienel, JŘrgen / Boves, Louis / Jouvet, Denis (2001): "Feature vector selection to improve ASR robustness in noisy conditions", In EUROSPEECH-2001, 201-204.