Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Minimum Confusibility Training of Context Dependent Demiphones

Albino Nogueiras-Rodríguez, José B. Marino

Research Center TALP, Department of Signal Theory and Communications, Universitat Politccnica de Catalunya. Barcelona, Spain

During the last years two different approaches have been widely used in order to improve the acoustic modeling in continuous speech recognition systems: discriminative training algorithms and context dependent subword units. However, while the use of each of these techniques leads to much better results than standard maximum likelihood trained phone models, their combination, i.e. discriminative training of context dependent units, has revealed to be a much more dificult task. In this paper we deal with minimum confusibility training of demiphones using TIMIT database. By applying this approach recently introduced by the authors, the string error rate in the recognition of TIDIGITS using demiphones is reduced some 24% with respect to maximum likelihood training. This improvement is added to the 8% reduction already provided by demiphones with respect to minimum confusibility trained phones.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Nogueiras-Rodríguez, Albino / Marino, José B. (1999): "Minimum confusibility training of context dependent demiphones", In EUROSPEECH'99, 2741-2744.