Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Automatic Labeling of Japanese Prosody Using J-ToBI Style Description

Hiroaki Noguchi (1,2), Kazuhisa Kiriyama (2), Hiroshi Matsuda (2), Miki Taniguchi (3), Yasuharu Den (2), Yasuhiro Katagiri (1)

(1) ATR Media Integration & Communications Research Laboratories; (2) Graduate School of Information Science, Nara Institute of Science and Technology; (3) Graduate School of Language and Culture, Osaka University, Japan

Speech corpora with prosodic labels are getting more and more important not only for speech synthesis but also for discourse modeling. A widely used labeling system for Japanese prosody, J-ToBI, however, is insufficient for applications like discourse modeling and it even lacks an accurate method for automatic labeling. In this paper, we propose an automatic labeling method for J-ToBI style description of tonal events in Japanese speech, aiming at applying it to a general-purpose labeling of Japanese prosody. The proposed method takes into account the linguistic constraints on the tone structure, which improves the accuracy of automatic labeling. We achieve a fairly good performance in a preliminary experiment using a read speech corpus. parameters, which, we believe, is suitable for applications like discourse modeling. One drawback of the model, however, is that it does not take into account the tone structure of the language to be modeled; it distinguishes only two types of tonal events, i.e., accents and boundary tones, which might be sufficient for English but is obviously insufficient for Japanese. The incorporation of linguistically adequate constraints on the tone structure, such as the ones utilized in J-ToBI, would enhance the model and improve the accuracy of automatic labeling when applied to Japanese. In our model, like the original tilt model, the labeling.

