Speech Prosody 2004

Nara, Japan
March 23-26, 2004

F0 Analysis and Modeling for Cantonese Text-to-Speech

Yujia Li, Tan Lee, Yao Qian

Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

This paper presents a study on the control of fundamental frequency (F0) in Cantonese text-to-speech (TTS) systems. The surface F0 contour of an utterance is considered as the combination of tone-related local components and phrase-level long-term variation. A novel method of F0 normalization has been developed to effectively separate them. Statistical analysis is performed for the phrase curves and the tone contours extracted from a large speech corpus, and the results are summarized into regular patterns. These patterns are used as the basic templates in a non-parametric F0 model, from which utterance-level F0 contours can be generated. Perceptual test shows the naturalness of speech naturalness is significantly improved by the new F0 model. The MOS increases by 0.65 over a five-point scale.

Full Paper

Bibliographic reference.  Li, Yujia / Lee, Tan / Qian, Yao (2004): "F0 analysis and modeling for Cantonese text-to-speech", In SP-2004, 467-470.