In this paper, large vocabulary children's speech recognition is investigated by using the Deep Neural Network - Hidden Markov Model (DNN-HMM) hybrid and the Subspace Gaussian Mixture Model (SGMM) acoustic modeling approach. In the investigated scenario training data is limited to about 7 hours of speech from children in the age range 7-13 and testing data consists in read clean speech from children in the same age range. To tackle inter-speaker acoustic variability, speaker adaptive training, based on feature space maximum likelihood linear regression, as well as vocal tract length normalization are adopted. Experimental results show that with both DNN-HMM and SGMM systems very good recognition results can be achieved although best results are obtained with the DNN-HMM system.
Bibliographic reference. Giuliani, Diego / BabaAli, Bagher (2015): "Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling", In INTERSPEECH-2015, 1635-1639.