September 22-25, 1997
Speaker Adaptive Training (SAT) has been investigated for mixture density estimation and applied to large vocabulary continuous speech recognition. SAT integrates MLLR adaptation in the HMM training and aims at reducing inter-speaker variability to get enhanced speaker-independent models. Starting from BBN's work on compact models, we derive a one-pass Viterbi formulation of SAT that performs joint estimation of MLLR-based transformations and density parameters. The computational complexity is analyzed and an approximation based on using inverse affine transformations is discussed. Compared to applying MLLR on standard SI models, our experimental results achieve lower error rates as well as reduced decoding costs, for both supervised batch and unsupervised incremental adaptation. In the latter case, it is shown that the enrollment of a new speaker can be sped up by selecting among the transformations that were estimated from the training speakers, the one that best fits with the first test utterance.
Bibliographic reference. Aubert, Xavier / Thelen, Eric (1997): "Speaker adaptive training applied to continuous mixture density modeling", In EUROSPEECH-1997, 1851-1854.