One-model speech recognition (SR) and synthesis (SS) based on common articulatory movement model are described. The SR engine has an articulatory feature (AF) extractor and an HMM based classifier that models articulatory gestures. Experimental results of a phoneme recognition task show that AF outperforms MFCC even if the training data are limited to a single speaker. In the SS engine, the same speaker-invariant HMM is applied to generate an AF sequence, and then after converting AFs into vocal tract parameters, speech signal is synthesized by a PARCOR filter together with a residual signal. Phoneme-to-phoneme speech conversion using AF exchange is also described.
Bibliographic reference. Nitta, Tsuneo / Onoda, Takayuki / Kimura, Masashi / Iribe, Yurie / Katsurada, Kouichi (2010): "One-model speech recognition and synthesis based on articulatory movement HMMs", In INTERSPEECH-2010, 2970-2973.