Improving Children’s Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion

W. Ahmad, S. Shahnawazuddin, H.K. Kathania, Gayadhar Pradhan, A.B. Samaddar


The task of transcribing children’s speech using statistical models trained on adults’ speech is very challenging. Large mismatch in the acoustic and linguistic attributes of the training and test data is reported to degrade the performance. In such speech recognition tasks, the differences in pitch (or fundamental frequency) between the two groups of speakers is one among several mismatch factors. To overcome the pitch mismatch, an existing pitch scaling technique based on iterative spectrogram inversion is explored in this work. Explicit pitch scaling is found to improve the recognition of children’s speech under mismatched setup. In addition to that, we have also studied the effect of discarding the phase information during spectrum reconstruction. This is motivated by the fact that the dominant acoustic feature extraction techniques make use of the magnitude spectrum only. On evaluating the effectiveness under mismatched testing scenario, the existing as well as the modified pitch scaling techniques result in very similar recognition performances. Furthermore, we have explored the role of pitch scaling on another speech recognition system which is trained on speech data from both adult and child speakers. Pitch scaling is noted to be effective for children’s speech recognition in this case as well.


 DOI: 10.21437/Interspeech.2017-302

Cite as: Ahmad, W., Shahnawazuddin, S., Kathania, H., Pradhan, G., Samaddar, A. (2017) Improving Children’s Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion. Proc. Interspeech 2017, 2391-2395, DOI: 10.21437/Interspeech.2017-302.


@inproceedings{Ahmad2017,
  author={W. Ahmad and S. Shahnawazuddin and H.K. Kathania and Gayadhar Pradhan and A.B. Samaddar},
  title={Improving Children’s Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2391--2395},
  doi={10.21437/Interspeech.2017-302},
  url={http://dx.doi.org/10.21437/Interspeech.2017-302}
}