Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Multistage Coarticulation Model Combining Articulatory, Formant and Cepstral Features

Yuqing Gao, Raimo Bakis, Jing Huang, Bing Xiang

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

We describe a multi-stage speech production model containing a linear, phoneme-independent coarticulation filter, followed by a nonlinear component. The latter generates two cepstra which are then additively combined: one corresponding to a relatively smooth background spectrum, and the other representing three formant-like spectral peaks. A neural net is used for both parts, but the second part also utilizes a hard-coded function that generates exactly three spectral peaks. A unified model of training, adaptation, and decoding is developed, each operation di ering only with respect to prior probability distributions. Prior probabilities can be introduced at each stage of the model, providing a flexible framework for utilizing both specific and general prior knowledge. We demonstrate the use of this model for speech synthesis as well as recognition.

Full Paper

Bibliographic reference.  Gao, Yuqing / Bakis, Raimo / Huang, Jing / Xiang, Bing (2000): "Multistage coarticulation model combining articulatory, formant and cepstral features", In ICSLP-2000, vol.1, 25-28.