We present an approach for rapid speaker-adaptation which both reduces inter-speaker variability on the acoustic level and permits dynamic adaptation of the system's reference model. In contrast to other methods we are using a two level approach, we (1) dynamicly normalize speech parameters (formants) to speaker specific means and variances, and (2) we are using an articulatory based representation which is situated between the acoustic and phonemic level. Performance was evaluated on a vocabulary independent continuous speech task with perplexity 120. We achieved 9. 2% word error using only 10 short sentences for adaptation to a new speaker; the error rates for the speaker-dependent and cross-speaker mode are 8. 5% and 24. 7% respectively. The results show that the articulatory representation is relatively speaker-invariant and can be "tuned" to a new speaker with only a small amount of training samples. Keywords: two-step speaker-adaptation method, normalized formant features, articulatory-feature vector (AFV), Hidden Markov Models.
Cite as: Schmidbauer, O., Höge, H. (1991) Speaker adaptation based on articulatory features. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1099-1102, doi: 10.21437/Eurospeech.1991-261
@inproceedings{schmidbauer91_eurospeech, author={O. Schmidbauer and H. Höge}, title={{Speaker adaptation based on articulatory features}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={1099--1102}, doi={10.21437/Eurospeech.1991-261} }