This paper describes a context-dependent model that supports extremely rapid speaker adaptation. The model, called "Eigencentroid plus Delta Trees" (EDT), incorporates prior knowledge about speaker space and has modest memory requirements. The paper gives the formulae for training EDT models and performs a detailed entropy analysis to show how EDT and speaker-independent models trained on experimental data differ from each other. Phoneme recognition results on the TIMIT database are also given. EDT yields 12.1% relative error rate reduction (ERR) for supervised adaptation on three sentences, 11.2% ERR for unsupervised adaptation on three sentences, and 10.4% ERR for self-adaptation on a single sentence.
Cite as: Perronnin, F., Kuhn, R., Nguyen, P., Junqua, J.-C. (2001) Maximum-likelihood training of a bipartite acoustic model for speech recognition. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 683-686, doi: 10.21437/Eurospeech.2001-193
@inproceedings{perronnin01_eurospeech, author={Florent Perronnin and Roland Kuhn and Patrick Nguyen and Jean-Claude Junqua}, title={{Maximum-likelihood training of a bipartite acoustic model for speech recognition}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={683--686}, doi={10.21437/Eurospeech.2001-193} }