16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Age-Dependent Height Estimation and Speaker Normalization for Children's Speech Using the First Three Subglottal Resonances

Jinxi Guo (1), Rohit Paturi (1), Gary Yeung (1), Steven M. Lulich (2), Harish Arsikere (3), Abeer Alwan (1)

(1) University of California at Los Angeles, USA
(2) Indiana University, USA
(3) Xerox Research Center India, India

This paper proposes an age-dependent scheme for automatic height estimation and speaker normalization of children's speech, using the first three subglottal resonances (SGRs). Similar to previous work, our analysis indicates that children above the age of 11 years show different acoustic properties from those under 11. Therefore, an age-dependent model is investigated. The estimation algorithms for the first three SGRs are motivated by our previous research for adults. The algorithms for the first two SGRs have been applied to children's speech before. This paper proposes a similar approach to estimate Sg3 for children. The algorithm is trained and evaluated on 46 children, aged between 6-17 years, using cross-validation. Average RMS errors in estimating Sg1, Sg2 and Sg3 using the age-dependent model are 51, 128 and 168 Hz, respectively. The height estimation algorithm employs a negative correlation between SGRs and height, and the mean absolute height estimation error was found to be less than 3.8cm for the younger children and 4.9cm for the older children. In addition, using TIDIGITS, a linear frequency warping scheme using age-dependent Sg3 gives statistically-significant word error rate reductions (up to 26%) relative to conventional VTLN.

Full Paper

Bibliographic reference.  Guo, Jinxi / Paturi, Rohit / Yeung, Gary / Lulich, Steven M. / Arsikere, Harish / Alwan, Abeer (2015): "Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonances", In INTERSPEECH-2015, 1665-1669.