Pitch-Adaptive Front-End Features for Robust Children’s ASR

S. Shahnawazuddin, Abhishek Dey, Rohit Sinha


In the presented work, we explore some of the challenges in recognizing children’s speech on automatic speech recognition (ASR) systems developed using adults’ speech. In such mismatched ASR tasks, a severely degraded recognition performance is observed due to the gross mismatch in the acoustic attributes between those two groups of speakers. Among the various sources of mismatch, we focus on the large differences in the average pitch values across the adult and child speakers in this work. Earlier studies have shown that the Mel-filterbank employed in the feature extraction is not able to smooth out the pitch harmonics sufficiently in particularly for the high-pitched child speakers. As a result of that, the acoustic features derived for the adult and the child speakers turn out to be significantly mismatched. For addressing this problem, we propose a simple technique based on adaptive-liftering for deriving the pitch-robust features. This enables us to reduce the sensitivity of the acoustic features to the gross variations in pitch across the speakers. The proposed features are found to result in improved performance in the context of deep neural network based ASR system. Further with the use of the existing feature normalization techniques, additional gains are noted.


DOI: 10.21437/Interspeech.2016-1020

Cite as

Shahnawazuddin, S., Dey, A., Sinha, R. (2016) Pitch-Adaptive Front-End Features for Robust Children’s ASR. Proc. Interspeech 2016, 3459-3463.

Bibtex
@inproceedings{Shahnawazuddin+2016,
author={S. Shahnawazuddin and Abhishek Dey and Rohit Sinha},
title={Pitch-Adaptive Front-End Features for Robust Children’s ASR},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1020},
url={http://dx.doi.org/10.21437/Interspeech.2016-1020},
pages={3459--3463}
}