In this work, we have studied the effect of pitch variations across the speech signals in context of automatic speech recognition. Our initial study done on vowel data indicates that on account of insufficient smoothing of pitch harmonics by the filterbank, particularly for high pitch signals, the variances of mel frequency cepstral coefficients (MFCC) feature significantly increase with increase in the pitch of the speech signals. Further to reduce the variance of MFCC feature due to varying pitch among speakers, a maximum likelihood based explicit pitch normalization method has been explored. On connected digit recognition task, with pitch normalization a relative improvement of 15% is obtained over baseline for children’s speech (higher pitch) on adults’ speech (lower pitch) trained models.
Bibliographic reference. Sinha, Rohit / Ghai, Shweta (2009): "On the use of pitch normalization for improving children's speech recognition", In INTERSPEECH-2009, 568-571.