5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Efficient Computation of MMI Neural Networks for Large Vocabulary Speech Recognition Systems

Jörg Rottland, Andre Ludecke, Gerhard Rigoll

Duisburg University, Germany

This paper describes, how to train Maximum Mutual Information Neural Networks (MMINN) in an efficient way, with a new topology. Large vocabulary speech recognition systems, based on a Hybrid MMI/connectionist HMM combination, have shown good performance on several tasks (RM and WSJ). MMINNs are trained to maximize the mutual information between the index of the winning output neuron (Winner-Takes-All network) and the phonetical class of the corresponding acoustic frame. One major problem of MMI-neural networks is the high computational effort, which is needed for the training of the neural networks. The computational effort is proportional to the input and output size of the neural network and to the number of training samples. This paper shows two approaches, that demonstrate, how these long training times can be reduced with very low or even no loss in recognition accuracy. This is achieved by the use of phonetical knowledge, to build a network topology based on phonetical classes.

Full Paper

Bibliographic reference.  Rottland, Jörg / Ludecke, Andre / Rigoll, Gerhard (1998): "Efficient computation of MMI neural networks for large vocabulary speech recognition systems", In ICSLP-1998, paper 0331.