September 22-25, 1997
In this paper, hybrid HMM/ANN systems are used to model context dependent phones. In order to reduce the number of parameters as well as to better catch the dynamics of the phonetic segments, we combine (context dependent) diphone models with context independent phone models. Transitions from phone to phone are modeled as generalized context dependent distributions while phonetic units are context independent models trained on the less coarticulated middle part of each phone. Words are thus modeled as a sequence of probability distributions alternatively representing the middle part of the phonemes and the transitions from phone to phone. A single neural network is used to estimate both context independent phone probabilities and generalized context dependent diphone (phone to phone transition) probabilities. Resulting systems are compared to classical context independent phone-based HMM/ANN systems with the same number of parameters. The Phonebook isolated word database has been used for training the systems. Testing is done on small (75 words), medium (600 words) and large (8000 words) lexicons. Test words were not present in the training vocabulary.
Bibliographic reference. Dupont, Stephane / Ris, Christophe / Deroo, Olivier / Fontaine, Vincent / Boite, Jean-Marc / Zanoni, L. (1997): "Context independent and context dependent hybrid HMM/ANN systems for vocabulary independent tasks", In EUROSPEECH-1997, 1947-1950.