Speech Recognition and Intrinsic Variation (SRIV2006)
This paper analyzes different ways of introducing the pitch frequency and a voicing parameter into speaker-independent speech recognition systems in order to check if their usefulness depends on the way they are modeled. Speech recognition performance evaluations were carried out on three speaker-independent speech recognition tasks. Modeling pitch and voicing features independently of the MFCC-based acoustic features through discrete or Gaussian densities slightly improves results on the three studied tasks. On the contrary directly introducing pitch and/or voicing features into the acoustic vector leads to significant recognition improvements on the isolated word recognition tasks, but does not bring any improvement on the continuous speech recognition task. Those results could be explained by the fact that the improvement brought when introducing pitch directly into the acoustic vector is related to certain dependency between the pitch frequency and the acoustic parameters. This dependency is much more important in the case of the isolated word recognition tasks than in the case of the continuous speech recognition task, in which prosody can lead to pitch changes depending on the prosodic context.
Bibliographic reference. Cloarec, Gwenael / Jouvet, Denis / MonnÚ, Jean (2006): "Analysis of the modeling of pitch and voicing parameters for speaker-independent speech recognition systems", In SRIV-2006, 65-70.