ISCA Archive ASR 2000
ISCA Archive ASR 2000

Lattice-based unsupervised MLLR for speaker adaptation

Mukund Padmanabhan, George Saon, Geoffrey Zweig

In this paper we explore the use of lattice-based information for unsupervised speaker adaptation. As initially formulated, maximum likelihood linear regression (MLLR) aims to linearly transform the means of the gaussian models in order to maximize the likelihood of the adaptation data given the correct hypothesis (supervised MLLR) or the decoded hypothesis (unsupervised MLLR). For the latter, if the first-pass decoded hypothesis is extremely erroneous (as it is the case for large vocabulary telephony applications) MLLR will often find a transform that increases the likelihood for the incorrect models, and may even lower the likelihood of the correct hypothesis. Since the oracle word error rate of a lattice is much lower than that of the 1-best or N-best hypotheses, by performing adaptation against a word lattice, the correct models are more likely to be used in estimating the transform. Furthermore, the particular MAP lattice that we propose enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predefined threshold.

Experiments performed on a voicemail speech recognition task indicate a relative 2% improvement in the word error rate of lattice MLLR over 1-best MLLR.

Cite as: Padmanabhan, M., Saon, G., Zweig, G. (2000) Lattice-based unsupervised MLLR for speaker adaptation. Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium, 128-132

  author={Mukund Padmanabhan and George Saon and Geoffrey Zweig},
  title={{Lattice-based unsupervised MLLR for speaker adaptation}},
  booktitle={Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium},