Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
This paper concerns the evaluation of prosody prediction at the symbolic level, in particular the locations of pitch accents and intonational boundaries. One evaluation method is to ask an expert to annotate text prosodically, and to compare the systemís predictions with this reference. However, this ignores the issue of optionality: there is usually more than one acceptable way to place accents and boundaries. Therefore, predictions that do not match the reference are not necessarily wrong. We propose dealing with this issue by means of a 3-class annotation which includes a class for optional accents/boundaries. We show, in a prosody prediction experiment using a memorybased learner, that evaluating against a 3-class annotation derived from multiple independent 2-class annotations allows us to identify the real prediction errors and to better estimate the real performance. Next, it is shown that a 3- class annotation produced directly by a single annotator yields a reasonable approximation of the more expensive 3- class annotation derived from multiple annotations. Finally, the results of a larger scale experiment confirm our findings.
Bibliographic reference. Marsi, Erwin (2004): "Optionality in evaluating prosody prediction", In SSW5-2004, 13-18.