Most commonly used model adaptation techniques employ linear/affine transformation on models/features to address the gross acoustic mismatch between the adults' and the children's speech data. Since all sources of acoustic mismatch may not be appropriately modeled by just linear transformation, in this work, the efficacy of our recently proposed explicit acoustic (pitch and speaking rate) normalization in combination with the existing normalization/adaptation techniques is explored for mismatched children's speech recognition. The study shows that explicit normalization of pitch and speaking rate of children's speech further improves the effectiveness of the adaptation methods. With explicit acoustic normalization significant relative improvements of 13% and 5% are obtained over that obtained with combined VTLN and CMLLR for children's speech recognition on adults' speech trained models for connected digit and continuous speech recognition tasks, respectively.
Bibliographic reference. Ghai, Shweta / Sinha, Rohit (2010): "Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization", In INTERSPEECH-2010, 522-525.