ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A study of vocal tract length normalization with generation-dependent acoustic models

Keiko Fujita, Yoshio Ono, Yoshihisa Nakatoh

In this paper, we propose a new speaker adaptation algorithm that employs vocal tract length normalization (VTLN) with generation-dependent acoustic models, and prove its validity with various generation subjects including small children and aged people.

Children and aged people have particular features in their pronunciations. For example, children, whose articluatory organs are under growing, often have deficits in articulation. Aged people also have unique pronunciations caused by aging features such as loss of their original teeth. On the other hand, VTL cannot be estimated only by speakers’ generation since VTL is highly dependent to speakers ’ individuality rather than generation. Though children have rather short VTLs than adult and aged people, exact VTL for each speaker cannot be estimated without analysis of each speaker’s voice.

Based on above our idea on generation features, in this paper, we propose VTLN with generation-dependent acoustic model as a speaker adaptation method suitable for various generations, and discuss the effect of our proposing method. Our results show that proposing method brings word error rate (WER) reduction by 52% for aged people, and by 63% for children.


Cite as: Fujita, K., Ono, Y., Nakatoh, Y. (2000) A study of vocal tract length normalization with generation-dependent acoustic models. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 706-709

@inproceedings{fujita00_icslp,
  author={Keiko Fujita and Yoshio Ono and Yoshihisa Nakatoh},
  title={{A study of vocal tract length normalization with generation-dependent acoustic models}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 706-709}
}