ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
In an attempt to construct a large-scale database of spontaneous speech, the authors planned to give segmental and prosodic labels to spontaneous Japanese speech. In this paper, the performance of those lebeling will be reported. First, the performance of automatic segmental labaling by Hidden Markov Model was verified. Sample speech of about four hours long was automatically phoneme labeled and compared to the results of hand-labeling. It turned out that average of label boundary difference with hand labeled data was 14.3 ms. Second, the performance of prosodic labeling, by newly proposed labeling scheme named X-JToBI (extended J-ToBI) was verified. The analysis of labeled data showed that newly added inventories appeared in the data of spontaneous speech and rate of inter-labeler agreement increased in nearly all types of labels.
Bibliographic reference. Kikuchi, Hideaki / Maekawa, Kikuo (2003): "Performance of segmental and prosodic labelling of spontaneous speech", in SSPR-2003, paper TAP6.