Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Speaker Adaptation Using Regularization And Network Adaptation For Hybrid MMI-NN/HMM Speech Recognition

Jörg Rottland, Christoph Neukirchen, Daniel Willett, Gerhard Rigoll

Department of Computer Science, Faculty of Electrical Engineering, Gerhard-Mercator-University Duisburg, Germany

This paper describes, how to perform speaker adaptation for a hybrid large vocabulary speech recognition system. The hybrid system is based on a Maximum Mutual Information Neural Net-work (MMINN), which is used as a Vector Quantizer (VQ) for a discrete HMM speech recognizer. The combination of MMINNs and HMMs has shown good performance on several large vocabulary speech recognition tasks like RM and WSJ. This paper now presents two approaches to perform speaker adaptation with this hybrid system. The first approach is a trans-formation of the feature space, which is performed by a neural network with maximum likelihood (ML) as objective function for the complete system, which means, that the parameters of the NN are estimated in order to match the HMM-parameters of the pretrained speaker independent system. The second approach is to adapt the HMM parameters depending on the amount of training data available per HMM, using a regularization approach. Both approaches can be applied jointly, which further improves the recognition accuracy.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Rottland, Jörg / Neukirchen, Christoph / Willett, Daniel / Rigoll, Gerhard (1999): "Speaker adaptation using regularization and network adaptation for hybrid MMI-NN/HMM speech recognition", In EUROSPEECH'99, 219-222.