8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Comparison of ML, MAP, and VB based Acoustic Models in Large Vocabulary Speech Recognition

Panu Juhani Somervuo

Helsinki University of Technology, Finland

The present work compares three different methods for training acoustic models in a Finnish large vocabulary speech recognition system. The models are trained using the maximum likelihood (ML), maximum a posteriori (MAP), and variational Bayesian (VB) principle. The results show that when the model complexity is properly chosen, all three methods give similar performance. As the model complexity increases, the performance of ML based system starts to degrade whereas no overfitting is observed using MAP and VB based models. MAP gives slightly better recognition accuracy over VB but it cannot be used for model selection without auxiliary data. The advantage of VB is that it can be used for selecting a well performing model structure using only training data.

Full Paper

Bibliographic reference.  Somervuo, Panu Juhani (2004): "Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition", In INTERSPEECH-2004, 701-704.