8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Unsupervised HMM Classification of F0 Curves

Damien Lolive, Nelly Barbot, Olivier Boeffard

IRISA, France

This article describes a new unsupervised methodology to learn F0 classes using HMM models on a syllable basis. A F0 class is represented by a HMM with three emitting states. The clustering algorithm relies on an iterative gaussian splitting and EM retraining process. First, a single class is learnt on a training corpus (8000 syllables) and it is then divided by perturbing gaussian means of successive levels. At each step, the mean RMS error is evaluated on a validation corpus (3000 syllables). The algorithm stops automatically when the error becomes stable or increases. The syllabic structure of a sentence is the reference level we have taken for F0 modelling even if the methodology can be applied to other structures. Clustering quality is evaluated in terms of cross-validation using a mean of RMS errors between F0 contours on a test corpus and the estimated HMM trajectories. The results show a pretty good quality of the classes (mean RMS error around 4Hz).

