September 22-25, 1997
We present results of a comparison between two prosody prediction algorithms, showing that the incorporation of information from a parser results in significantly improved performance for our text-to- speech synthesiser. We used a stochastic tree-based parser to generate a tagged and bracketed representation of the input text, and then interpreted this higher-level information to produce a ToBI-type prosodic annotation of the text. From this annotation an intonation contour was predicted for use in synthesising the speech. Results show that prediction of prosodic phrasing and focal prominence are improved by 56% and 62% respectively over previous methods compared against a human reading of the same test utterances.
Bibliographic reference. Campbell, Nick / Hebert, Tony / Black, Ezra (1997): "Parsers, prominence, and pauses", In EUROSPEECH-1997, 979-982.