ISCA Archive SSW 2010
ISCA Archive SSW 2010

Unsupervised prosody labeling for constructing Mandarin TTS

Chen Yu Chiang, Sin-Horng Chen, Yih-Ru Wang

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constituents using both prosodic and linguistic features. The experimental results showed that the proposed unsupervised prosody labeling method could effectively label important prosodic cues so as to improve prosody prediction in a HMM-based text-to-speech system. Therefore, the proposed unsupervised prosody labeling method is promising and could be widely applied for labeling other large speech corpora.

Index Terms: prosody labeling, speech synthesis


Cite as: Chiang, C.Y., Chen, S.-H., Wang, Y.-R. (2010) Unsupervised prosody labeling for constructing Mandarin TTS. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 264-269

@inproceedings{chiang10_ssw,
  author={Chen Yu Chiang and Sin-Horng Chen and Yih-Ru Wang},
  title={{Unsupervised prosody labeling for constructing Mandarin TTS}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={264--269}
}