Sixth International Conference on Spoken Language Processing
An accurate database segmented and labeled at phonetic, subword or word level is very important for speech research. However, manual segmentation and labeling is a time consuming and error prone task. This paper describes an automatic procedure for the segmentation of speech in a set of acoustic sub-words units: given either the linguistic or the phonetic content of a speech utterance, the system provides unit boundaries. The technique is based on the use of an acoustic sub-word unit Hidden Markov Model (HMM) recognizer in order to provide a coarse segmentation based on Viterbi alignment, which is refined later by means of an acoustic segmentation and a small set of rules based on acoustic features. These rules represent phonetic knowledge and address the correction of unexpected segmentation errors which are a major problem of such HMM recognizers. In addition, these rules are useful to analyze sequences of sounds including sonorants or several successive vowels. Segmentation experiments have been conducted in a Galician speech database to check the reliability of the resulting system.
Bibliographic reference. Docío-Fernández, Laura / García-Mateo, Carmen (2000): "Automatic segmentation of speech based on hidden Markov models and acoustic features", In ICSLP-2000, vol.4, 708-711.