We have investigated a novel method of spectral estimation based on mixture of Gaussians in a sinusoidal analysis and synthesis framework. After quantisation of this parametric scheme a fixed frame-rate coder operating at a bit-rate of around 2.4 kbits/s has been developed. This paper describes an extension to this spectral model based on constraining the parameters of the mixture of Gaussians to be on a polynomial trajectory over a segment of speech data. This is referred to as the mixture of Gaussians polynomial model (MGPM). In order to realise a segmental coder, dynamic programming over the utterance is performed. The segmental representation of the spectra results in a log-likelihood score over a segment which is used as the cost function in the dynamic programming algorithm. Speech coding components such aspitch, voicing and gain are described segmentally. A number of segmental coders are presented with bit-rates in the range of 350 to 650 bits/s. These coders offer good and intelligible coded speech evaluated using DRT scoring at these bit-rates.
Cite as: Zolfaghari, P., Robinson, T. (1999) Speech coding using mixture of gaussians polynomial model. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1495-1498, doi: 10.21437/Eurospeech.1999-340
@inproceedings{zolfaghari99_eurospeech, author={Parham Zolfaghari and Tony Robinson}, title={{Speech coding using mixture of gaussians polynomial model}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1495--1498}, doi={10.21437/Eurospeech.1999-340} }