11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

One-Model Speech Recognition and Synthesis Based on Articulatory Movement HMMs

Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada

Toyohashi University of Technology, Japan

One-model speech recognition (SR) and synthesis (SS) based on common articulatory movement model are described. The SR engine has an articulatory feature (AF) extractor and an HMM based classifier that models articulatory gestures. Experimental results of a phoneme recognition task show that AF outperforms MFCC even if the training data are limited to a single speaker. In the SS engine, the same speaker-invariant HMM is applied to generate an AF sequence, and then after converting AFs into vocal tract parameters, speech signal is synthesized by a PARCOR filter together with a residual signal. Phoneme-to-phoneme speech conversion using AF exchange is also described.

Full Paper

Bibliographic reference.  Nitta, Tsuneo / Onoda, Takayuki / Kimura, Masashi / Iribe, Yurie / Katsurada, Kouichi (2010): "One-model speech recognition and synthesis based on articulatory movement HMMs", In INTERSPEECH-2010, 2970-2973.