Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

A Stochastic Morphological Analyzer for Spontaneously Spoken Languages

Masaaki Nagata

NTT Network Information Systems Laboratories, Kanagawa, Japan

We present a morphological analysis method for the phonetic transcription of spontaneous speech using a stochastic language modeling technique and an efficient two-pass N-best search strategy. It can segment a phonetically transcribed utterance into word, assign parts of speech to each segmented words, and convert the phonetic transcription into an orthographic transcription, which, in the case of Japanese, means the conversion from "hiragana" (phonogram) to "kanji" (ideogram). The morphological analyzer can handle pauses, interjections, restatements and chimings, all which are characteristics of spontaneous speech, by learning the parameters of the language model directly from the ¦phonetic transcription. The proposed morphological analyzer achieves 95.0% recall and 95.3% precision on closed text when it was trained and tested on a portion (containing 172,826 words) of the ATR Corpus, telephone dialogues in the conference registration domain.

Full Paper

Bibliographic reference.  Nagata, Masaaki (1994): "A stochastic morphological analyzer for spontaneously spoken languages", In ICSLP-1994, 795-798.