It is well known that HMM is ineffective in modeling the dynamics of speech due to the piecewise stationary and the independent observation assumptions. In this paper, we propose an analytically tractable framework in which the two modeling techniques are combined to reach a jointly optimal decision in both training and recognition. The combination is achieved by coupling the hidden processes from the HMM and the segment model. To take the full advantage of the segmental approach, phone-pair units are used as the basic acoustic units for segment models. In addition, we construct context-dependent phone-pair units to account for acoustic variations in context. The superior quality of phone-pair segment models contributes to an 8.2% reduction in error rates on the WSJ dictation task.
Cite as: Hon, H.-W., Kumar, S., Wang, K. (2000) Unifying HMM and phone-pair segment models. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 286-289, doi: 10.21437/ICSLP.2000-71
@inproceedings{hon00_icslp, author={Hsiao-Wuen Hon and Shankar Kumar and Kuansan Wang}, title={{Unifying HMM and phone-pair segment models}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 286-289}, doi={10.21437/ICSLP.2000-71} }