ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

XMLLR for improved speaker adaptation in speech recognition

Daniel Povey, Hong-Kwang Jeff Kuo

In this paper we describe a novel technique for adaptation of Gaussian means. The technique is related to Maximum Likelihood Linear Regression (MLLR), but we regress not on the mean itself but on a vector associated with each mean. These associated vectors are initialized by an ingenious technique based on eigen decomposition. As the only form of adaptation this technique outperforms MLLR, even with multiple regression classes and Speaker Adaptive Training (SAT). However, when combined with Constrained MLLR (CMLLR) and Vocal Tract Length Normalization (VTLN) the improvements disappear. The combination of two forms of SAT (CMLLR-SAT and MLLR-SAT) which we performed as a baseline is itself a useful result; we describe it more fully in a companion paper. XMLLR is an interesting approach which we hope may have utility in other contexts, for example in speaker identification.


doi: 10.21437/Interspeech.2008-380

Cite as: Povey, D., Kuo, H.-K.J. (2008) XMLLR for improved speaker adaptation in speech recognition. Proc. Interspeech 2008, 1705-1708, doi: 10.21437/Interspeech.2008-380

@inproceedings{povey08c_interspeech,
  author={Daniel Povey and Hong-Kwang Jeff Kuo},
  title={{XMLLR for improved speaker adaptation in speech recognition}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1705--1708},
  doi={10.21437/Interspeech.2008-380}
}