Sixth International Conference on Spoken Language Processing
In this paper, we propose a new speaker adaptation algorithm that employs vocal tract length normalization (VTLN) with generation-dependent acoustic models, and prove its validity with various generation subjects including small children and aged people.
Children and aged people have particular features in their pronunciations. For example, children, whose articluatory organs are under growing, often have deficits in articulation. Aged people also have unique pronunciations caused by aging features such as loss of their original teeth. On the other hand, VTL cannot be estimated only by speakersí generation since VTL is highly dependent to speakers í individuality rather than generation. Though children have rather short VTLs than adult and aged people, exact VTL for each speaker cannot be estimated without analysis of each speakerís voice.
Based on above our idea on generation features, in this paper, we propose VTLN with generation-dependent acoustic model as a speaker adaptation method suitable for various generations, and discuss the effect of our proposing method. Our results show that proposing method brings word error rate (WER) reduction by 52% for aged people, and by 63% for children.
Bibliographic reference. Fujita, Keiko / Ono, Yoshio / Nakatoh, Yoshihisa (2000): "A study of vocal tract length normalization with generation-dependent acoustic models", In ICSLP-2000, vol.3, 706-709.