11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Modelling Speech Line Spectral Frequencies with Dirichlet Mixture Models

Zhanyu Ma, Arne Leijon

KTH, Sweden

In this paper, we model the underlying probability density function (PDF) of the speech line spectral frequencies (LSF) parameters with a Dirichlet mixture model (DMM). The LSF parameters have two special features: 1) the LSF parameters have a bounded range; 2) the LSF parameters are in an increasing order. By transforming the LSF parameters to the ?LSF parameters, the DMM can be used to model the ?LSF parameters and take the advantage of the features mentioned above. The distortion-rate (D-R) relation is derived for the Dirichlet distribution with the high rate assumption. A bit allocation strategy for DMM is also proposed. In modelling the LSF parameters extracted from the TIMIT database, the DMM shows a better performance compared to the Gaussian mixture model, in terms of D-R relation, likelihood and model complexity. Since modelling is the essential and prerequisite step in the PDF-optimized vector quantizer design, better modelling results indicate a superior quantization performance.

