Sixth European Conference on Speech Communication and Technology
This paper presents two different directions to build HMM models which give enough acoustic resolution and fit in limited user resources. They both refer to scaling down the acoustic models which are built with tied gaussian HMMs. The total number of gaussians is reduced by a pairwise merging, and the number of gaussians per state is reduced by selecting them based on the so called occupancy criterion. Experiments carried out on the WSJ recognition task show that after scaling down, no further training is needed when the number of gaussians or the number of gaussians per state is reduced up to a factor three. This is an advantage as retraining can not be executed by the final system user.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Xu, Wei / Duchateau, Jacques / Demuynck, Kris / Dologlou, Ioannis / Wambacq, Patrick / Compernolle, Dirk van / Hamme, Hugo van (1999): "Accuracy versus complexity in context dependent phone modeling", In EUROSPEECH'99, 1127-1130.