13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition

Muhammad Ali Tahir, Markus Nussbaum-Thom, Ralf Schlüter, Hermann Ney

Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Aachen, Germany

A method is proposed to incorporate mixture density splitting into the acoustic model discriminative training for speech recognition. The standard method is to obtain a high resolution acoustic model by maximum likelihood training and density splitting, and then improving this model by discriminative training. We choose a log-linear form of acoustic model because for a single Gaussian density per triphone state the log-linear MMI optimization is a convex optimization problem, and by further splitting and discriminative training of this model we can get a higher complexity model. Previously it was shown that we achieve large gains in the objective function and corresponding moderate gains in the word error rate on a large vocabulary corpus. This paper incorporates the state of the art minimum phone error training criterion into the framework, and shows that after discriminative splitting, a subsequent log-linear MPE training achieves better results than Gaussian mixture model MPE optimization alone.

Index Terms: speech recognition, log linear modelling, discriminative training

Full Paper

Bibliographic reference.  Tahir, Muhammad Ali / Nussbaum-Thom, Markus / Schlüter, Ralf / Ney, Hermann (2012): "Simultaneous discriminative training and mixture splitting of HMMs for speech recognition", In INTERSPEECH-2012, 571-574.