Log-linear model combination is the standard approach in LVCSR to combine several knowledge sources, usually an acoustic and a language model. Instead of using a single scaling factor per knowledge source, we make the scaling factor word- and pronunciation-dependent. In this work, we combine three acoustic models, a pronunciation model, and a language model for a Mandarin BN/BC task. The achieved error rate reduction of 2% relative is small but consistent for two test sets. An analysis of the results shows that the major contribution comes from the improved interdependency of language and acoustic model.
Cite as: Hoffmeister, B., Liang, R., Schlüter, R., Ney, H. (2009) Log-linear model combination with word-dependent scaling factors. Proc. Interspeech 2009, 248-251, doi: 10.21437/Interspeech.2009-87
@inproceedings{hoffmeister09_interspeech, author={Björn Hoffmeister and Ruoying Liang and Ralf Schlüter and Hermann Ney}, title={{Log-linear model combination with word-dependent scaling factors}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={248--251}, doi={10.21437/Interspeech.2009-87} }