The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Compression of Line Spectral Frequency Parameters using the Asynchronous Interpolation Model

Alexander Kain, Todd Leen

Division of Biomedical Computer Science, Oregon Health & Science University, Portland, Oregon, USA

We apply an asynchronous interpolation model (AIM) to line spectral frequency trajectories. AIM represents speech transition features as crossfading between basis vector features, governed by individual interpolation weights per feature component. Basis vectors are initialized from demiphone labels, and then optimized using a local reconstruction error. Using a small diphone acoustic inventory, we reduce the number of parameters by using dimensionreduced latent space weights and a vector quantized pool of basis vectors. The highest compression rate of 1:11 resulted in a log spectral distortion of 4.83 dB.

Full Paper

Bibliographic reference.  Kain, Alexander / Leen, Todd (2010): "Compression of line spectral frequency parameters using the asynchronous interpolation model", In SSW7-2010, 49-54.