This paper describes an efficient method of unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are necessary for the adaptation. Also, by using the sufficient HMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal cluster because the clustering result is determined according to test speaker's data on-line. Experimental results show that the proposed method attains better improvement than MLLR from the speaker-independent model. The proposed method is evaluated in details and discussed.
Cite as: Yoshizawa, S., Baba, A., Matsunami, K., Mera, Y., Yamada, M., Lee, A., Shikano, K. (2001) Evaluation on unsupervised speaker adaptation based on sufficient HMM statictics of selected speakers. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1219-1222, doi: 10.21437/Eurospeech.2001-317
@inproceedings{yoshizawa01_eurospeech, author={Shinichi Yoshizawa and Akira Baba and Kanako Matsunami and Yuichirou Mera and Miichi Yamada and Akinobu Lee and Kiyohiro Shikano}, title={{Evaluation on unsupervised speaker adaptation based on sufficient HMM statictics of selected speakers}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1219--1222}, doi={10.21437/Eurospeech.2001-317} }