A new integrated model for simultaneous modeling of linguistic and acoustic models, and a training algorithm is proposed. Usually, text-to-speech (TTS) systems based on the hidden Markov model (HMM) consist of text analysis and speech synthesis modules. Linguistic and acoustic model training are performed independently using different training data sets. Integrated model parameters were simultaneously optimized by the proposed training algorithm. The derived algorithm optimizes two model parameter sets simultaneously. Therefore, the appropriate model is estimated because we can directly-formulate the TTS problem in which the speech waveform is generated from a word sequence. We conducted objective evaluation experiments using phrasing and prosodic models to evaluate the effectiveness of the proposed technique. Index Terms— TTS system, hidden Markov model, phrasing model, prosodic model
Cite as: Oura, K., Nankaku, Y., Toda, T., Tokuda, K., Maia, R., Sakai, S., Nakamura, S. (2008) Simultaneous Phrasing, Prosody, and Acoustic Model Training for Text-to-Speech Conversion. Proc. International Symposium on Chinese Spoken Language Processing, 1-4
@inproceedings{oura08_iscslp, author={Keiichiro Oura and Yoshihiko Nankaku and Tomoki Toda and Keiichi Tokuda and Rannierry Maia and Shinsuke Sakai and Satoshi Nakamura}, title={{Simultaneous Phrasing, Prosody, and Acoustic Model Training for Text-to-Speech Conversion}}, year=2008, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={1--4} }