Speech synthesis based on one-model of articulatory movement HMMs, that are commonly applied to both speech recognition (SR) and speech synthesis (SS), is described. In an SS module, speaker-invariant HMMs are applied to generate an articulatory feature (AF) sequence, and then, after converting AFs into vocal tract parameters by using a multilayer neural network (MLN), a speech signal is synthesized through an LSP digital filter. The CELP coding technique is applied to improve voice-sources when generating these sources from embedded codes in the corresponding state of HMMs. The proposed SS module separates phonetic information and the individuality of a speaker. Therefore, the targeted speaker's voice can be synthesized with a small amount of speech data. In the experiments, we carried out listening tests for ten subjects and evaluated both of sound quality and individuality of synthesized speech. As a result, we confirmed that the proposed SS module could produce good quality speech of the targeted speaker even when the training was done with the data set of two-sentences.
Bibliographic reference. Nitta, Tsuneo / Onoda, Takayuki / Kimura, Masashi / Iribe, Yurie / Katsurada, Kouichi (2011): "Speech synthesis based on articulatory-movement HMMs with voice-source codebooks", In INTERSPEECH-2011, 1841-1844.