This paper focuses on the definition and modeling of robust context-dependent units for flexible vocabulary-recognition. It proposes a new technique for tuning the acoustic resolution of the models, and discusses the advantages of representing phonetic transcriptions in terms of a sequence of stationary context-independent phonemes and diphone-transition coarticulation units rather than with the classical diphone or triphone units. Combining these two techniques, the recognition rate of a speaker-independent recognizer with a vocabulary of 600 surnames increases from 91.2% to 96% using less than one third of the densities of the original models.
Bibliographic reference. Fissore, L. / Ravera, F. / Laface, Pietro (1995): "Acoustic-phonetic modeling for flexible vocabulary speech recognition", In EUROSPEECH-1995, 799-802.