The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
We apply an asynchronous interpolation model (AIM) to line spectral frequency trajectories. AIM represents speech transition features as crossfading between basis vector features, governed by individual interpolation weights per feature component. Basis vectors are initialized from demiphone labels, and then optimized using a local reconstruction error. Using a small diphone acoustic inventory, we reduce the number of parameters by using dimensionreduced latent space weights and a vector quantized pool of basis vectors. The highest compression rate of 1:11 resulted in a log spectral distortion of 4.83 dB.
Bibliographic reference. Kain, Alexander / Leen, Todd (2010): "Compression of line spectral frequency parameters using the asynchronous interpolation model", In SSW7-2010, 49-54.