International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

An NN-based Approach to Prosody Generation for English Word Spelling in English-Chinese Bilingual TTS

Wei-Chih Kuo, Yih-Ru Wang, Hung-Mao Lu, Sin-Horng Chen

Chiao Tung University, Hsinchu, Taiwan

In this paper, an RNN-MLP-based scheme to generate proper prosodic information for spelling English words embedded in Chinese text background is proposed. It is extended from the RNN prosody synthesis scheme of an existing Mandarin TTS by adding four MLPs to follow the RNN. It first treats each English word as a Chinese word and uses the RNN to generate eight prosodic parameters for each alphabet of the word. It then uses these four MLPs to refine these prosodic parameters. Experimental results showed that the proposed RNN-MLP scheme led to 36.3, 37.3, 11.6, and 29.1% reductions in RMSE for the synthesized alphabet duration, log-energy level, pitch contour, and pause duration, respectively, over the scheme using the RNN only.

