![]() |
International Workshop on Hands-Free Speech Communication (HSC2001)April 9-11, 2001 |
![]() |
The paper describes our initial efforts towards noise robust, small resource triphone based acoustic models for speech recognition in the car. For that purpose we employ a multi-style training method and investigate the usefulnes of in-car training data that is affected by noise to different degrees. In doing so, we also investigate the relationship between the acoustic model size and the speaker independent word error rate and demonstrate the benefits of using a Bayesian Information Criterion for the determination of an appropriate number of Gaussian mixture components.
We combine different baseline models and compare traditional recognizer output voting schemes with computationally less expensive feature combination methods. While the former show only small improvements due to the fact that most of the contributing recognizers make the same type of errors, likelihood combination methods achieve a 18 percent relative improvement on real life test data with an average SNR of 5.4 dB.
Bibliographic reference. Fischer, V. / Kunzmann, S. J. (2001): "Bayesian information criterion based multi-style training and likelihood combination for robust hands free speech recognition in the car", In HSC2001, 99-102.