International Symposium on Chinese Spoken Language Processing
August 23-24, 2002
An NN-based Approach to Prosody Generation for English Word Spelling in English-Chinese Bilingual TTS
Wei-Chih Kuo, Yih-Ru Wang, Hung-Mao Lu, Sin-Horng Chen
Chiao Tung University, Hsinchu, Taiwan
In this paper, an RNN-MLP-based scheme to generate proper
prosodic information for spelling English words embedded in
Chinese text background is proposed. It is extended from the
RNN prosody synthesis scheme of an existing Mandarin TTS
by adding four MLPs to follow the RNN. It first treats each
English word as a Chinese word and uses the RNN to generate
eight prosodic parameters for each alphabet of the word. It then
uses these four MLPs to refine these prosodic parameters.
Experimental results showed that the proposed RNN-MLP
scheme led to 36.3, 37.3, 11.6, and 29.1% reductions in RMSE
for the synthesized alphabet duration, log-energy level, pitch
contour, and pause duration, respectively, over the scheme
using the RNN only.
Kuo, Wei-Chih / Wang, Yih-Ru / Lu, Hung-Mao / Chen, Sin-Horng (2002):
"An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS",
In ISCSLP 2002, paper 127.