9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

XMLLR for Improved Speaker Adaptation in Speech Recognition

Daniel Povey, Hong-Kwang Jeff Kuo

IBM T.J. Watson Research Center, USA

In this paper we describe a novel technique for adaptation of Gaussian means. The technique is related to Maximum Likelihood Linear Regression (MLLR), but we regress not on the mean itself but on a vector associated with each mean. These associated vectors are initialized by an ingenious technique based on eigen decomposition. As the only form of adaptation this technique outperforms MLLR, even with multiple regression classes and Speaker Adaptive Training (SAT). However, when combined with Constrained MLLR (CMLLR) and Vocal Tract Length Normalization (VTLN) the improvements disappear. The combination of two forms of SAT (CMLLR-SAT and MLLR-SAT) which we performed as a baseline is itself a useful result; we describe it more fully in a companion paper. XMLLR is an interesting approach which we hope may have utility in other contexts, for example in speaker identification.

Full Paper

Bibliographic reference.  Povey, Daniel / Kuo, Hong-Kwang Jeff (2008): "XMLLR for improved speaker adaptation in speech recognition", In INTERSPEECH-2008, 1705-1708.