We apply an asynchronous interpolation model (AIM) to line spectral frequency trajectories. AIM represents speech transition features as crossfading between basis vector features, governed by individual interpolation weights per feature component. Basis vectors are initialized from demiphone labels, and then optimized using a local reconstruction error. Using a small diphone acoustic inventory, we reduce the number of parameters by using dimensionreduced latent space weights and a vector quantized pool of basis vectors. The highest compression rate of 1:11 resulted in a log spectral distortion of 4.83 dB.
Cite as: Kain, A., Leen, T. (2010) Compression of line spectral frequency parameters using the asynchronous interpolation model. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 49-54
@inproceedings{kain10_ssw, author={Alexander Kain and Todd Leen}, title={{Compression of line spectral frequency parameters using the asynchronous interpolation model}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={49--54} }