ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Morphological Analysis of the Corpus of Spontaneous Japanese

Kiyotaka Uchimoto (1), Chikashi Nobata (1), Atsushi Yamada (1), Satoshi Sekine (2), Hitoshi Isahara (1)

(1) Communications Research Laboratory, Kyoto, Japan
(2) New York University, New York, USA

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and a method for accurately tagging a large spontaneous speech corpus. In this paper, we show that by using semi-automatic analysis we can expect a precision of over 99% for detecting and tagging short words and 97% for long words; the two types of words comprising the corpus.


Full Paper

Bibliographic reference.  Uchimoto, Kiyotaka / Nobata, Chikashi / Yamada, Atsushi / Sekine, Satoshi / Isahara, Hitoshi (2003): "Morphological analysis of the corpus of spontaneous Japanese", in SSPR-2003, paper TAO2.