Among the features extracted from the speech signal, it's clear that someone are directly dependent on the elementary acoustic level whereas the others depend on the suprasegmental level such as the phonetic level. A major deficiency of a standard EMM is that it takes into account uniformly the informations. In this paper, we try to resolve this problem using the Two Level EMM which introduces the features with respect to their informative contents, either on the elementary acoustic level or on the phonetical level. Namely, the incorporation of global sound durations is explored. More, since variations in speaking rate affect sound durations, we propose to appropriatly adapt the sound duration pdf parameters. Experiments on french number database show that such an explicit introduction of prosodic parameters improves the recognition accuracy.
Bibliographic reference. Suaudeau, Nelly / Andre-Obrecht, Regine (1993): "Sound duration modelling and time-variable speaking rate in a speech recognition system", In EUROSPEECH'93, 307-310.