11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Enhancing Children's Speech Recognition Under Mismatched Condition by Explicit Acoustic Normalization

Shweta Ghai, Rohit Sinha

IIT Guwahati, India

Most commonly used model adaptation techniques employ linear/affine transformation on models/features to address the gross acoustic mismatch between the adults' and the children's speech data. Since all sources of acoustic mismatch may not be appropriately modeled by just linear transformation, in this work, the efficacy of our recently proposed explicit acoustic (pitch and speaking rate) normalization in combination with the existing normalization/adaptation techniques is explored for mismatched children's speech recognition. The study shows that explicit normalization of pitch and speaking rate of children's speech further improves the effectiveness of the adaptation methods. With explicit acoustic normalization significant relative improvements of 13% and 5% are obtained over that obtained with combined VTLN and CMLLR for children's speech recognition on adults' speech trained models for connected digit and continuous speech recognition tasks, respectively.

Full Paper

Bibliographic reference.  Ghai, Shweta / Sinha, Rohit (2010): "Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization", In INTERSPEECH-2010, 522-525.