Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Robust Speaker Adaptation of Continuous Density HMMS Using Multilayer Perceptron Network

Mikko Harju (1), Petri Salmela (1), Olli Viikki (2), Mikko Lehtokangas (1), Jukka Saarinen (1)

(1) Tampere University of Technology, Signal Processing Laboratory, Tampere, Finland
(2) Nokia Research Center, Speech and Audio Systems Laboratory, Tampere, Finland

The performance of global affine and nonlinear trans-formations for speaker adaptation in a hidden Markov model (HMM) speech recognition system are compared in this paper. The nonlinear transformation was obtained with a multilayer perceptron network (MLP) which was trained during the adaptation process to transform the mean vectors of the HMMs such that the output proba-bilities of the HMMs for the adaptation utterances were maximized. The performance of the MLP adaptation method was compared to the maximum likelihood linear regression (MLLR) adaptation procedure. Both of these methods were tested in a connected digit speech recogni-tion system using multi-environment models. The results show that the nonlinear MLP transformation clearly out-performs MLLR in terms of adaptation speed. Moreover, the performance of MLP adaptation with larger amounts of data was comparable to the MLLR performance.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Harju, Mikko / Salmela, Petri / Viikki, Olli / Lehtokangas, Mikko / Saarinen, Jukka (1999): "Robust speaker adaptation of continuous density HMMS using multilayer perceptron network", In EUROSPEECH'99, 2499-2502.