This paper presents improved HMM/SVM methods for a two-stage phoneme segmentation framework, which tries to imitate the human phoneme segmentation process. The first stage performs hidden Markov model (HMM) forced alignment according to the minimum boundary error (MBE) criterion. The objective is to align a phoneme sequence of a speech utterance with its acoustic signal counterpart based on MBE-trained HMMs and explicit phoneme duration models. The second stage uses the support vector machine (SVM) method to refine the hypothesized phoneme boundaries derived by HMM-based forced alignment. The efficacy of the proposed framework has been validated on two speech databases: the TIMIT English database and the MATBN Mandarin Chinese database.
Bibliographic reference. Kuo, Jen-Wei / Lo, Hung-Yi / Wang, Hsin-Min (2007): "Improved HMM/SVM methods for automatic phoneme segmentation", In INTERSPEECH-2007, 2057-2060.