Sixth European Conference on Speech Communication and Technology
We have investigated a novel method of spectral estimation based on mixture of Gaussians in a sinusoidal analysis and synthesis framework. After quantisation of this parametric scheme a fixed frame-rate coder operating at a bit-rate of around 2.4 kbits/s has been developed. This paper describes an extension to this spectral model based on constraining the parameters of the mixture of Gaussians to be on a polynomial trajectory over a segment of speech data. This is referred to as the mixture of Gaussians polynomial model (MGPM). In order to realise a segmental coder, dynamic programming over the utterance is performed. The segmental representation of the spectra results in a log-likelihood score over a segment which is used as the cost function in the dynamic programming algorithm. Speech coding components such aspitch, voicing and gain are described segmentally. A number of segmental coders are presented with bit-rates in the range of 350 to 650 bits/s. These coders offer good and intelligible coded speech evaluated using DRT scoring at these bit-rates.
Full Paper (PDF)
Acoustic Example #1 (350)
Acoustic Example #2 (450)
Acoustic Example #3 (550)
Acoustic Example #4 (650)
Acoustic Example #5 (ORI)
Bibliographic reference. Zolfaghari, Parham / Robinson, Tony (1999): "Speech coding using mixture of gaussians polynomial model", In EUROSPEECH'99, 1495-1498.