Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs

Kazumi Ohkura, Masahide Sugiyama, Shigeki Sagayama

ATR Interpreting Telephony Research Laboratories, Kyoto, Japan

This paper describes a method of speaker adaptation for continuous mixture density HMMs (CDHMMs). Speaker adaptation in CDHMMs is regarded as a kind of retraining problem where a small amount of training data is available. The "Vector Field Smoothing method (VFS)" is used to deal with the problem of retraining with insufficient training data. "VFS" is applied simultaneously to inter-speaker and speaking-style adaptation. In this paper, the standard speaker is a male and the unknown speakers for adaptation are both one male and one female. When 11 sentences are uttered for adaptation phrase-by-phrase instead of word-by-word, the 23 phoneme recognition rate is 87.4% (none adaptation: 47.3%). The phrase recognition rate for HMM-LR is 85.1% (none adaptation: 21.5%).

Full Paper

Bibliographic reference.  Ohkura, Kazumi / Sugiyama, Masahide / Sagayama, Shigeki (1992): "Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs", In ICSLP-1992, 369-372.