ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

On the use of pitch normalization for improving children's speech recognition

Rohit Sinha, Shweta Ghai

In this work, we have studied the effect of pitch variations across the speech signals in context of automatic speech recognition. Our initial study done on vowel data indicates that on account of insufficient smoothing of pitch harmonics by the filterbank, particularly for high pitch signals, the variances of mel frequency cepstral coefficients (MFCC) feature significantly increase with increase in the pitch of the speech signals. Further to reduce the variance of MFCC feature due to varying pitch among speakers, a maximum likelihood based explicit pitch normalization method has been explored. On connected digit recognition task, with pitch normalization a relative improvement of 15% is obtained over baseline for childrenÂ’s speech (higher pitch) on adultsÂ’ speech (lower pitch) trained models.


doi: 10.21437/Interspeech.2009-202

Cite as: Sinha, R., Ghai, S. (2009) On the use of pitch normalization for improving children's speech recognition. Proc. Interspeech 2009, 568-571, doi: 10.21437/Interspeech.2009-202

@inproceedings{sinha09_interspeech,
  author={Rohit Sinha and Shweta Ghai},
  title={{On the use of pitch normalization for improving children's speech recognition}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={568--571},
  doi={10.21437/Interspeech.2009-202}
}