In this study, we investigated speaker's intentions that the listeners perceive through subtly different sentence-final intonations. Approximately 2,000 sentence utterances were recorded and the fundamental frequency (F0) contours at the last vowel of those sentences were classified through one of the standard clustering algorithms. There found various F0 contours, namely, not only simple rising and falling intonations but also rise-fall and fall-rise intonations. In order to reveal the relationship between the intonation and the intentions, 10 representative contours were selected on the basis of the results of the clustering. Using the selected contours, a subjective evaluation was conducted. Six Japanese sentences that could have different meanings according to the sentence-final intonations were synthesized and the F0 contour at the last vowel of each sentence was replaced with the contours. The results of the evaluation by nine listeners showed that, for example, a certain falling intonation could express the intention of the econvictionf and another one that slightly differ in the shape could convey edoubt.f It was found that the subtle difference in the sentence-final F0 shape conveyed various nuances and connotations.
Index Terms: speech synthesis, sentence-final intona- tion, speaker's intention
Bibliographic reference. Iwata, Kazuhiko / Kobayashi, Tetsunori (2012): "Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis", In INTERSPEECH-2012, 442-445.