Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Optimization Methods for Discriminative Training

Jonathan Le Roux (1), Erik McDermott (2)

(1) University of Tokyo, Japan; (2) NTT Corporation, Japan

Discriminative training applied to hidden Markov model (HMM) design can yield significant benefits in recognition accuracy and model compactness. However, compared to Maximum Likelihood based methods, discriminative training typically requires much more computation, as all competing candidates must be considered, not just the correct one. The choice of the algorithm used to optimize the discriminative criterion function is thus a key issue. We investigated several algorithms and used them for discriminative training based on the Minimum Classification Error (MCE) framework. In particular, we examined on-line, batch, and semi-batch Probabilistic Descent (PD), as well as Quickprop, Rprop and BFGS. We describe each algorithm and present comparative results on the TIMIT phone classification task and on the 230 hour Corpus of Spontaneous Japanese (CSJ) 30K word continuous speech recognition task.

Full Paper

Bibliographic reference.  Roux, Jonathan Le / McDermott, Erik (2005): "Optimization methods for discriminative training", In INTERSPEECH-2005, 3341-3344.