Adaptation Methods for Speech Recognition

August 29-30, 2001
Sophia Antipolis, France

A Bayesian Prediction Approach to Robust Speech Recognition and Online Speaker Adaptation

Jen-Tzung Chien

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan R.O.C.

Because the acoustic environments are uncertain and nonstationary, it is necessary to characterize the uncertainty of speech hidden Markov models (HMMís) for recognition and trace the uncertainty sequentially to match the nonstationary environments. In this study, we develop a new Bayesian predictive classification (BPC) framework for robust decision and online speaker adaptation. The BPC decision is established by modeling the uncertainties of HMM mean vector and precision matrix using a conjugate prior density. The framebased predictive distributions using multivariate t distributions and approximate Gaussian distributions are exploited. After recognition, the prior density is pooled with the likelihood of the current sentence to generate the reproducible prior density. The hyperparameters of prior density are accordingly adjusted to meet the newest environments and apply for the recognition of coming data. As a result, an efficient online unsupervised learning is developed for speech recognition without needing adaptation data. In the experiments, the proposed approach is significantly better than the conventional plug-in maximum a posteriori (MAP) decision.

Full Paper

Bibliographic reference.  Chien, Jen-Tzung (2001): "A Bayesian prediction approach to robust speech recognition and online speaker adaptation", In Adaptation-2001, 77-80.