9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Fast Speaker Adaptive Training for Speech Recognition

Daniel Povey, Hong-Kwang Jeff Kuo, Hagen Soltau

IBM T.J. Watson Research Center, USA

In this paper we describe various fast and convenient implementations of Speaker Adaptive Training (SAT) for use in training when Maximum Likelihood Linear Regression (MLLR) is to be used in test time to adapt Gaussian means. The memory and disk requirements for most of these are similar to those for normal ML training; the computation in all cases is dominated by the need to compute the MLLR transforms. Commonly MLLR is combined with Constrained MLLR (CMLLR) which can be viewed as a feature space affine transform and has its own form of SAT (we will call this CMLLR-SAT); we experiment with combining the two forms of SAT. We find that even on top of CMLLR-SAT, MLLR-SAT gives improvements.

Full Paper

Bibliographic reference.  Povey, Daniel / Kuo, Hong-Kwang Jeff / Soltau, Hagen (2008): "Fast speaker adaptive training for speech recognition", In INTERSPEECH-2008, 1245-1248.