An accurate database segmented and labeled at phonetic, subword or word level is very important for speech research. However, manual segmentation and labeling is a time consuming and error prone task. This paper describes an automatic procedure for the segmentation of speech in a set of acoustic sub-words units: given either the linguistic or the phonetic content of a speech utterance, the system provides unit boundaries. The technique is based on the use of an acoustic sub-word unit Hidden Markov Model (HMM) recognizer in order to provide a coarse segmentation based on Viterbi alignment, which is refined later by means of an acoustic segmentation and a small set of rules based on acoustic features. These rules represent phonetic knowledge and address the correction of unexpected segmentation errors which are a major problem of such HMM recognizers. In addition, these rules are useful to analyze sequences of sounds including sonorants or several successive vowels. Segmentation experiments have been conducted in a Galician speech database to check the reliability of the resulting system.
Cite as: Docío-Fernández, L., García-Mateo, C. (2000) Automatic segmentation of speech based on hidden Markov models and acoustic features. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 708-711, doi: 10.21437/ICSLP.2000-910
@inproceedings{dociofernandez00_icslp, author={Laura Docío-Fernández and Carmen García-Mateo}, title={{Automatic segmentation of speech based on hidden Markov models and acoustic features}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 708-711}, doi={10.21437/ICSLP.2000-910} }