ISCA Archive SSW 2021
ISCA Archive SSW 2021

Vocal tract area function extraction using ultrasound for articulatory speech synthesis

Debasish Ray Mohapatra, Pramit Saha, Yadong Liu, Bryan Gick, Sidney Fels

This paper studies the feasibility of an articulatory speech synthesizer by extracting the mid-sagittal tongue and palate contours using the ultrasound (US) imaging modality. The extracted contours are then used to compute the vocal tract crosssectional areas (i.e., area function) during phonation, which then drives an articulary speech synthesizer. Using this approach, we synthesized four phonetic vowel sounds (/a/, /i/, /e/ and /o/). The derived vocal tract (VT) transfer functions are shown to match over multiple utterances for a single vowel, thereby confirming reliable and accurate area function derivation using the US. The acoustic formants of simulated vowels using the proposed method show a modest deviation from the speaker’s recorded speech signal since the current articulatory model does not include the mouth radiation mechanism. Furthermore, the higher formants’ positions (F5-F8) are approximately equivalent to the high-quality standard MRI-based acoustic results and have an average error of 3.90%, 4.14%, 1.26% and 2.99% for vowel sounds /a/, /i/, /e/ and /o/, respectively. Our approach provides a step towards developing a USbased speech synthesizer for precise extraction of the upper VT geometry and enabling speakers to drive an articulatory model directly by their tongue movements without the necessity of vocalization.


doi: 10.21437/SSW.2021-16

Cite as: Mohapatra, D.R., Saha, P., Liu, Y., Gick, B., Fels, S. (2021) Vocal tract area function extraction using ultrasound for articulatory speech synthesis. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 90-95, doi: 10.21437/SSW.2021-16

@inproceedings{mohapatra21_ssw,
  author={Debasish Ray Mohapatra and Pramit Saha and Yadong Liu and Bryan Gick and Sidney Fels},
  title={{Vocal tract area function extraction using ultrasound for articulatory speech synthesis}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={90--95},
  doi={10.21437/SSW.2021-16}
}