ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Automatic segmentation of speech based on hidden Markov models and acoustic features

Laura Docío-Fernández, Carmen García-Mateo

An accurate database segmented and labeled at phonetic, subword or word level is very important for speech research. However, manual segmentation and labeling is a time consuming and error prone task. This paper describes an automatic procedure for the segmentation of speech in a set of acoustic sub-words units: given either the linguistic or the phonetic content of a speech utterance, the system provides unit boundaries. The technique is based on the use of an acoustic sub-word unit Hidden Markov Model (HMM) recognizer in order to provide a coarse segmentation based on Viterbi alignment, which is refined later by means of an acoustic segmentation and a small set of rules based on acoustic features. These rules represent phonetic knowledge and address the correction of unexpected segmentation errors which are a major problem of such HMM recognizers. In addition, these rules are useful to analyze sequences of sounds including sonorants or several successive vowels. Segmentation experiments have been conducted in a Galician speech database to check the reliability of the resulting system.


Cite as: Docío-Fernández, L., García-Mateo, C. (2000) Automatic segmentation of speech based on hidden Markov models and acoustic features. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 708-711

@inproceedings{dociofernandez00_icslp,
  author={Laura Docío-Fernández and Carmen García-Mateo},
  title={{Automatic segmentation of speech based on hidden Markov models and acoustic features}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 708-711}
}