ISCA Archive ISCSLP 2002
ISCA Archive ISCSLP 2002

An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS

Wei-Chih Kuo, Yih-Ru Wang, Hung-Mao Lu, Sin-Horng Chen

In this paper, an RNN-MLP-based scheme to generate proper prosodic information for spelling English words embedded in Chinese text background is proposed. It is extended from the RNN prosody synthesis scheme of an existing Mandarin TTS by adding four MLPs to follow the RNN. It first treats each English word as a Chinese word and uses the RNN to generate eight prosodic parameters for each alphabet of the word. It then uses these four MLPs to refine these prosodic parameters. Experimental results showed that the proposed RNN-MLP scheme led to 36.3, 37.3, 11.6, and 29.1% reductions in RMSE for the synthesized alphabet duration, log-energy level, pitch contour, and pause duration, respectively, over the scheme using the RNN only.


Cite as: Kuo, W.-C., Wang, Y.-R., Lu, H.-M., Chen, S.-H. (2002) An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS. Proc. International Symposium on Chinese Spoken Language Processing, paper 127

@inproceedings{kuo02_iscslp,
  author={Wei-Chih Kuo and Yih-Ru Wang and Hung-Mao Lu and Sin-Horng Chen},
  title={{An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS}},
  year=2002,
  booktitle={Proc. International Symposium on Chinese Spoken Language Processing},
  pages={paper 127}
}