Discriminative training applied to hidden Markov model (HMM) design can yield significant benefits in recognition accuracy and model compactness. However, compared to Maximum Likelihood based methods, discriminative training typically requires much more computation, as all competing candidates must be considered, not just the correct one. The choice of the algorithm used to optimize the discriminative criterion function is thus a key issue. We investigated several algorithms and used them for discriminative training based on the Minimum Classification Error (MCE) framework. In particular, we examined on-line, batch, and semi-batch Probabilistic Descent (PD), as well as Quickprop, Rprop and BFGS. We describe each algorithm and present comparative results on the TIMIT phone classification task and on the 230 hour Corpus of Spontaneous Japanese (CSJ) 30K word continuous speech recognition task.
Cite as: Roux, J.L., McDermott, E. (2005) Optimization methods for discriminative training. Proc. Interspeech 2005, 3341-3344, doi: 10.21437/Interspeech.2005-858
@inproceedings{roux05_interspeech, author={Jonathan Le Roux and Erik McDermott}, title={{Optimization methods for discriminative training}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3341--3344}, doi={10.21437/Interspeech.2005-858} }