ISCA Archive SPASR 2013
ISCA Archive SPASR 2013

Automatic classification of palatal and pharyngealwall shape categories from speech acoustics and inverted articulatory signals

Ming Li, Adam Lammert, Jangwon Kim, Prasanta Kumar Ghosh, Shrikanth S. Narayanan

Inter-speaker variability is pervasive in speech, and the ability to predict sources of inter-speaker variability from acoustics can afford scientific and technological advantages. An important source of this variability is vocal tract morphology. This work proposes a statistical model-based approach to classifying the shape of the hard palate and the pharyngeal wall from speech audio. We used principal component analysis for the parameterization of the morphological shape. Analysis using K-means clustering showed that both the palate and the pharyngeal wall shape data group into two major categories. These in turn are used as targets for automatic classification using acoustic features derived at the utterance level with OpenSmile and at the model level using GMM based posterior probability supervectors. Since articulatory motions are dependent on morphological shape, the model uses estimated articulatory features on top of speech acoustics for improving the classification performance. Experimental results showed 70% and 63% unweighted accuracy for binary classifications of palate and pharyngeal wall shapes in the rtMRI database, respectively, and 63% for the palate shape on the X-Ray Microbeam database.

Index Terms: speech production, vocal tract morphology, acoustic-to-articulatory inversion, speaker recognition


Cite as: Li, M., Lammert, A., Kim, J., Ghosh, P.K., Narayanan, S.S. (2013) Automatic classification of palatal and pharyngealwall shape categories from speech acoustics and inverted articulatory signals. Proc. Speech Production in Automatic Speech Recognition (SPASR-2013), 34-39

@inproceedings{li13_spasr,
  author={Ming Li and Adam Lammert and Jangwon Kim and Prasanta Kumar Ghosh and Shrikanth S. Narayanan},
  title={{Automatic classification of palatal and pharyngealwall shape categories from speech acoustics and inverted articulatory signals}},
  year=2013,
  booktitle={Proc. Speech Production in Automatic Speech Recognition (SPASR-2013)},
  pages={34--39}
}