INTERSPEECH 2004 - ICSLP
The present work compares three different methods for training acoustic models in a Finnish large vocabulary speech recognition system. The models are trained using the maximum likelihood (ML), maximum a posteriori (MAP), and variational Bayesian (VB) principle. The results show that when the model complexity is properly chosen, all three methods give similar performance. As the model complexity increases, the performance of ML based system starts to degrade whereas no overfitting is observed using MAP and VB based models. MAP gives slightly better recognition accuracy over VB but it cannot be used for model selection without auxiliary data. The advantage of VB is that it can be used for selecting a well performing model structure using only training data.
Bibliographic reference. Somervuo, Panu Juhani (2004): "Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition", In INTERSPEECH-2004, 701-704.