8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese

Tatsuya Kawahara (1), Kiyotaka Uchimoto (2), Hitoshi Isahara (2), Kazuya Shitaoka (1)

(1) Kyoto University, Japan
(2) National Institute of Information and Communications Technology, Japan

This paper addresses automatic detection of dependencies between Japanese phrasal units called bunsetsus, and sentence boundaries in a spontaneous speech corpus. In spontaneous speech, the biggest problem with dependency structure analysis is that sentence boundaries are ambiguous. In this paper, we propose two methods for improving the accuracy of sentence boundary detection in spontaneous Japanese: one based on unsupervised learning and the other based on machine learning. Experimental results show that the sentence boundary detection accuracy of 84.85 in F-measure is achieved by using the proposed methods and the accuracy of dependency structure analysis is also improved by using the information on automatically detected sentence boundaries.

Full Paper

Bibliographic reference.  Kawahara, Tatsuya / Uchimoto, Kiyotaka / Isahara, Hitoshi / Shitaoka, Kazuya (2004): "Dependency structure analysis and sentence boundary detection in spontaneous Japanese", In INTERSPEECH-2004, 1353-1356.