Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Accuracy Versus Complexity in Context Dependent Phone Modeling

Wei Xu, Jacques Duchateau, Kris Demuynck, Ioannis Dologlou, Patrick Wambacq, Dirk van Compernolle (1), Hugo van Hamme (1)

Katholieke Universiteit Leuven, ESAT PSI, Belgium
(1) Lernout & Hauspie Speech Products, Belgium

This paper presents two different directions to build HMM models which give enough acoustic resolution and fit in limited user resources. They both refer to scaling down the acoustic models which are built with tied gaussian HMMs. The total number of gaussians is reduced by a pairwise merging, and the number of gaussians per state is reduced by selecting them based on the so called occupancy criterion. Experiments carried out on the WSJ recognition task show that after scaling down, no further training is needed when the number of gaussians or the number of gaussians per state is reduced up to a factor three. This is an advantage as retraining can not be executed by the final system user.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Xu, Wei / Duchateau, Jacques / Demuynck, Kris / Dologlou, Ioannis / Wambacq, Patrick / Compernolle, Dirk van / Hamme, Hugo van (1999): "Accuracy versus complexity in context dependent phone modeling", In EUROSPEECH'99, 1127-1130.