EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Maximum-Likelihood Training of a Bipartite Acoustic Model for Speech Recognition

Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua

Panasonic Speech Technology Laboratory, USA

This paper describes a context-dependent model that supports extremely rapid speaker adaptation. The model, called "Eigencentroid plus Delta Trees" (EDT), incorporates prior knowledge about speaker space and has modest memory requirements. The paper gives the formulae for training EDT models and performs a detailed entropy analysis to show how EDT and speaker-independent models trained on experimental data differ from each other. Phoneme recognition results on the TIMIT database are also given. EDT yields 12.1% relative error rate reduction (ERR) for supervised adaptation on three sentences, 11.2% ERR for unsupervised adaptation on three sentences, and 10.4% ERR for self-adaptation on a single sentence.

Full Paper

Bibliographic reference.  Perronnin, Florent / Kuhn, Roland / Nguyen, Patrick / Junqua, Jean-Claude (2001): "Maximum-likelihood training of a bipartite acoustic model for speech recognition", In EUROSPEECH-2001, 683-686.