Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Automatic Segmentation of Speech Based on Hidden Markov Models and Acoustic Features

Laura Docío-Fernández, Carmen García-Mateo

E.T.S.E.Telecomunicacion - University of Vigo, Spain

An accurate database segmented and labeled at phonetic, subword or word level is very important for speech research. However, manual segmentation and labeling is a time consuming and error prone task. This paper describes an automatic procedure for the segmentation of speech in a set of acoustic sub-words units: given either the linguistic or the phonetic content of a speech utterance, the system provides unit boundaries. The technique is based on the use of an acoustic sub-word unit Hidden Markov Model (HMM) recognizer in order to provide a coarse segmentation based on Viterbi alignment, which is refined later by means of an acoustic segmentation and a small set of rules based on acoustic features. These rules represent phonetic knowledge and address the correction of unexpected segmentation errors which are a major problem of such HMM recognizers. In addition, these rules are useful to analyze sequences of sounds including sonorants or several successive vowels. Segmentation experiments have been conducted in a Galician speech database to check the reliability of the resulting system.


Full Paper

Bibliographic reference.  Docío-Fernández, Laura / García-Mateo, Carmen (2000): "Automatic segmentation of speech based on hidden Markov models and acoustic features", In ICSLP-2000, vol.4, 708-711.