16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Semi-Supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models

Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Johns Hopkins University, USA

Maximum Mutual Information (MMI) is a popular discriminative criterion that has been used in supervised training of acoustic models for automatic speech recognition. However, standard discriminative training is very sensitive to the accuracy of the transcription and hence its implementation in a semi-supervised setting requires extensive filtering of data. We will show that if the supervision transcripts are not known, the natural analogue of MMI is to minimize the conditional entropy of the lattice of possible transcripts of the data. This is equivalent to the weighted average of MMI criterion over different reference transcripts, taking those reference transcripts and their weighting from the lattice itself. In this paper we describe experiments where we applied this method to the semi-supervised training of Deep Neural Network acoustic models. In our experimental setup, the proposed method gives up to 0.5% absolute WER improvement over a DNN trained with sMBR only on the transcribed part of the data. This is 37% of the improvement that we would get from doing sMBR training if we had the transcripts for the untranscribed part of the data.

Full Paper

Bibliographic reference.  Manohar, Vimal / Povey, Daniel / Khudanpur, Sanjeev (2015): "Semi-supervised maximum mutual information training of deep neural network acoustic models", In INTERSPEECH-2015, 2630-2634.