Several specific tasks in the field of text-to-speech synthesis requires a huge amount of labeled speech corpora. Mostly, these labels correspond to phone marks aligned on the speech waveform. Different kind of solutions have been applied to this problem from rule-based systems to stochastic-based ones. We validate here a solution based on Hidden Makov Models. Various test configurations are proposed. At the acoustic level, we compare LSP to MFCC coefficients and the fitness of multigaussians for this segmentation task. At the topological level, we compare standard left-to-right models to phonological dependent topologies. The best configuration we found is related to an MFCC analysis with standard left-to-right models and with diagonal multi-gaussians per state. For this configuration the overall root mean squared error on the test database is 18 +/- 0.3 ms within a 99% confidence interval.
Cite as: Nefti, S., Boeffard, O. (2001) Acoustical and topological experiments for an HMM-based speech segmentation system. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1711-1714, doi: 10.21437/Eurospeech.2001-401
@inproceedings{nefti01_eurospeech, author={Samir Nefti and Olivier Boeffard}, title={{Acoustical and topological experiments for an HMM-based speech segmentation system}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1711--1714}, doi={10.21437/Eurospeech.2001-401} }