EUROSPEECH 2001 Scandinavia
This paper describes a context-dependent model that supports extremely rapid speaker adaptation. The model, called "Eigencentroid plus Delta Trees" (EDT), incorporates prior knowledge about speaker space and has modest memory requirements. The paper gives the formulae for training EDT models and performs a detailed entropy analysis to show how EDT and speaker-independent models trained on experimental data differ from each other. Phoneme recognition results on the TIMIT database are also given. EDT yields 12.1% relative error rate reduction (ERR) for supervised adaptation on three sentences, 11.2% ERR for unsupervised adaptation on three sentences, and 10.4% ERR for self-adaptation on a single sentence.
Bibliographic reference. Perronnin, Florent / Kuhn, Roland / Nguyen, Patrick / Junqua, Jean-Claude (2001): "Maximum-likelihood training of a bipartite acoustic model for speech recognition", In EUROSPEECH-2001, 683-686.