Interspeech'2005 - Eurospeech
Inverse problems with respect to parameters of the articulatory model are solved for all types of sounds: vowels, semi-vowels, nasals, stops and fricatives in various contexts. Acoustical parameters of the speech signal and trajectories of some reference points inside the vocal tract serve as input data. 3.7%, 3.8% and 2.6% average approximation error for the first three formants, 8.5% for the specific frequencies of fricative spectra, 2.8% for the coordinates of reference points for all kinds of phonemes are obtained when both - acoustic and articulatory data are used. 1.8%, 1.6%, and 1.1% error for the first three formant frequencies, and 6% for the coordinates of reference points are obtained when only acoustic data are used. Original and re-synthesized utterances are found to be very similar in appearance, according to subjective assessment.
Bibliographic reference. Sorokin, Viktor N. / Leonov, A. S. / Makarov, I. S. / Tsyplikhin, A. I. (2005): "Speech inversion and re-synthesis", In INTERSPEECH-2005, 3209-3212.