International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Training a Hands-Free Recognizer with Reverberated Clean Speech and Additive Noise

Volker Stahl, Alexander Fischer, Rolf Bippus

Philips Research Laboratones, Aachen, Germany

Environmental mismatch between training and test conditions is a key problem for robust speech recognition. For hands-free applications acoustic conditions like reverberation and noise are highly dependent of the target domain, therefore the training material should cover a whole range of acoustic environments. Data collections in multiple environments are expensive, hence we investigate methods to synthesize training data by transforming clean speech under certain model assumptions of the target domain. In order to evaluate this approach, time synchronous recordings of 15,000 utterances by 200 speakers with a high quality close talk microphone and an inexpensive distant microphone have been conducted in two living rooms. We compare the performance of a speech recognition system which has been trained under matched conditions on the far microphone signal with one trained on a close talk signal with artificial reverberation and additive noise. The error in the second scenario is 10% relative higher compared to matched training for a natural number string recognition task and 30% higher for a command phrases task. However, if the system would have been trained just on clean speech without transformation, the error rates would be 100% higher for natural numbers and 250% for command phrases.


Full Paper

Bibliographic reference.  Stahl, Volker / Fischer, Alexander / Bippus, Rolf (2001): "Training a hands-free recognizer with reverberated clean speech and additive noise", In HSC2001, 171-174.