Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Speech Inversion and Re-Synthesis

Viktor N. Sorokin (1), A. S. Leonov (2), I. S. Makarov (1), A. I. Tsyplikhin (1)

(1) Russian Academy of Sciences, Russia; (2) Moscow Physical Engineering Institute, Russia

Inverse problems with respect to parameters of the articulatory model are solved for all types of sounds: vowels, semi-vowels, nasals, stops and fricatives in various contexts. Acoustical parameters of the speech signal and trajectories of some reference points inside the vocal tract serve as input data. 3.7%, 3.8% and 2.6% average approximation error for the first three formants, 8.5% for the specific frequencies of fricative spectra, 2.8% for the coordinates of reference points for all kinds of phonemes are obtained when both - acoustic and articulatory data are used. 1.8%, 1.6%, and 1.1% error for the first three formant frequencies, and 6% for the coordinates of reference points are obtained when only acoustic data are used. Original and re-synthesized utterances are found to be very similar in appearance, according to subjective assessment.

Full Paper

Bibliographic reference.  Sorokin, Viktor N. / Leonov, A. S. / Makarov, I. S. / Tsyplikhin, A. I. (2005): "Speech inversion and re-synthesis", In INTERSPEECH-2005, 3209-3212.