An Exploration of Local Speaking Rate Variations in Mandarin Read Speech

Guan-Ting Liou, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen


This paper explores speaking rate variation in Mandarin read speech. In contrast to assuming that each utterance is generated in a constant or global speaking rate, this study seeks to estimate local speaking rate for each prosodic unit in an utterance. The exploration is based on the existing speaking rate-dependent hierarchical prosodic model (SR-HPM). The main idea is to first use the SR-HPM to explore the prosodic structures of utterances and extract the prosodic units. Then, local speaking rate is estimated for each prosodic unit (prosodic phrase in this study). Some major influence factors including tone, base syllable type, prosodic structure and speaking rate of the higher prosodic units (utterance and BG/PG) are compensated in the local SR estimation. A syntactic-local SR model is constructed and use in the prosody generation of Mandarin TTS. Experimental results on a large read speech corpus generated by a professional female announcer showed that the generated prosody with local speaking rate variations is proved to be more vivid than the one with a constant speaking rate.


 DOI: 10.21437/Interspeech.2018-1214

Cite as: Liou, G., Chiang, C., Wang, Y., Chen, S. (2018) An Exploration of Local Speaking Rate Variations in Mandarin Read Speech. Proc. Interspeech 2018, 42-46, DOI: 10.21437/Interspeech.2018-1214.


@inproceedings{Liou2018,
  author={Guan-Ting Liou and Chen-Yu Chiang and Yih-Ru Wang and Sin-Horng Chen},
  title={An Exploration of Local Speaking Rate Variations in Mandarin Read Speech},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={42--46},
  doi={10.21437/Interspeech.2018-1214},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1214}
}