16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Large Vocabulary Children's Speech Recognition with DNN-HMM and SGMM Acoustic Modeling

Diego Giuliani (1), Bagher BabaAli (2)

(1) FBK, Italy
(2) University of Tehran, Iran

In this paper, large vocabulary children's speech recognition is investigated by using the Deep Neural Network - Hidden Markov Model (DNN-HMM) hybrid and the Subspace Gaussian Mixture Model (SGMM) acoustic modeling approach. In the investigated scenario training data is limited to about 7 hours of speech from children in the age range 7-13 and testing data consists in read clean speech from children in the same age range. To tackle inter-speaker acoustic variability, speaker adaptive training, based on feature space maximum likelihood linear regression, as well as vocal tract length normalization are adopted. Experimental results show that with both DNN-HMM and SGMM systems very good recognition results can be achieved although best results are obtained with the DNN-HMM system.

Full Paper

Bibliographic reference.  Giuliani, Diego / BabaAli, Bagher (2015): "Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling", In INTERSPEECH-2015, 1635-1639.