ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Performance of Segmental and Prosodic Labelling of Spontaneous Speech

Hideaki Kikuchi, Kikuo Maekawa

National Institute for Japanese Language, Japan Waseda University, Tokyo, Japan 12 1

In an attempt to construct a large-scale database of spontaneous speech, the authors planned to give segmental and prosodic labels to spontaneous Japanese speech. In this paper, the performance of those lebeling will be reported. First, the performance of automatic segmental labaling by Hidden Markov Model was verified. Sample speech of about four hours long was automatically phoneme labeled and compared to the results of hand-labeling. It turned out that average of label boundary difference with hand labeled data was 14.3 ms. Second, the performance of prosodic labeling, by newly proposed labeling scheme named X-JToBI (extended J-ToBI) was verified. The analysis of labeled data showed that newly added inventories appeared in the data of spontaneous speech and rate of inter-labeler agreement increased in nearly all types of labels.

