Duration, intensity and pause predictions in relation to prosody organization

Chiu-yu Tseng, Bau-Ling Fu

Our research group has postulated a perceptually based multiphrase prosody framework for speech paragraphs in fluent speech using corporal analyses. The framework features a prosody hierarchy that organizes phrases and sentences into prosodic groups (PG) in connected speech, and specifies cross-phrase prosodic relationships in the acoustic domains [1, 2]. A corresponding fluent speech prosody model with four independent acoustic modules was also constructed [3]. The model predicts cross-phrase F0 contours, duration patterns, intensity distribution and pause insertions in accordance with prosody organization. Cumulative results from each and every prosody layer accounts for overall output prosody. We have since improved the model first by refining the duration and intensity modules through corpus analysis, and subsequently used the above improved results to facilitate better pause/break predictions. As a result, the enhanced model is now more robust than its initial version. Future works will focus on applying the improved model to synthesis of fluent connected speech.

doi: 10.21437/Interspeech.2005-503

