International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Some Issues on the Study of Vocal Tract Normalization

Zhuo Wang, Peng Ding, Bo Xu

Chinese Academy of Sciences, Beijing, China

Vocal tract normalization (VTN) is an effective way to reduce inter-speaker variability mainly caused by variation of vocal tract shape among speakers of different genders and age groups. In this paper, some practical implementation issues of VTN are discussed. We adopted a method to train model and selected the proper normalization scales of different speakers. The acoustic model is estimated from the unnormalized acoustic vectors of large speakers by maximum likelihood training. Then we use the gender-independent model to select the proper normalization scales of different speaker. The above steps are repeated. For VTN in training, we discussed with the drift effect of the warp parameter with the increasing of the number of iterations and the number of mixtures of the acoustic model. We studied the distribution of the warp parameter of different genders and age groups. To facilitate the fast warp parameter selection process, we proposed a hierarchical method and compared with the traditional methods.


Full Paper

Bibliographic reference.  Wang, Zhuo / Ding, Peng / Xu, Bo (2002): "Some issues on the study of vocal tract normalization", In ISCSLP 2002, paper 91.