Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Speech Coding Using Mixture of Gaussians Polynomial Model

Parham Zolfaghari (1), Tony Robinson (2)

(1) CREST/ATR Human Information Processing Research Labs, Kyoto, Japan
(2) Cambridge University Engineering Department, Cambridge, UK

We have investigated a novel method of spectral estimation based on mixture of Gaussians in a sinusoidal analysis and synthesis framework. After quantisation of this parametric scheme a fixed frame-rate coder operating at a bit-rate of around 2.4 kbits/s has been developed. This paper describes an extension to this spectral model based on constraining the parameters of the mixture of Gaussians to be on a polynomial trajectory over a segment of speech data. This is referred to as the mixture of Gaussians polynomial model (MGPM). In order to realise a segmental coder, dynamic programming over the utterance is performed. The segmental representation of the spectra results in a log-likelihood score over a segment which is used as the cost function in the dynamic programming algorithm. Speech coding components such aspitch, voicing and gain are described segmentally. A number of segmental coders are presented with bit-rates in the range of 350 to 650 bits/s. These coders offer good and intelligible coded speech evaluated using DRT scoring at these bit-rates.

Full Paper (PDF)   Gnu-Zipped Postscript

Acoustic Example #1 (350)
Acoustic Example #2 (450)
Acoustic Example #3 (550)
Acoustic Example #4 (650)
Acoustic Example #5 (ORI)

Bibliographic reference.  Zolfaghari, Parham / Robinson, Tony (1999): "Speech coding using mixture of gaussians polynomial model", In EUROSPEECH'99, 1495-1498.