This paper describes two approaches to assigning prosodic phrase structure and pauses to text and investigates the impact of errors in the assignments for different granularities of prosodic phrase structure. One approach uses a cascaded combination of models trained separately for prediction of prosodic phrase structure and pauses and the other uses a model trained for the joint prediction task directly. Objective measurements show similar performance for both approaches while perceptual evaluations show a slight preference for an optimised cascaded combination of prosodic phrase structure and pause models using a single-level encoding of prosodic phrase structure.
Cite as: Burrows, T., Jackson, P., Knill, K., Sityaev, D. (2005) Combining models of prosodic phrasing and pausing. Proc. Interspeech 2005, 1829-1832, doi: 10.21437/Interspeech.2005-557
@inproceedings{burrows05_interspeech, author={Tina Burrows and Peter Jackson and Katherine Knill and Dmitry Sityaev}, title={{Combining models of prosodic phrasing and pausing}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1829--1832}, doi={10.21437/Interspeech.2005-557} }