ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

A stochastic morphological analyzer for spontaneously spoken languages

Masaaki Nagata

We present a morphological analysis method for the phonetic transcription of spontaneous speech using a stochastic language modeling technique and an efficient two-pass N-best search strategy. It can segment a phonetically transcribed utterance into word, assign parts of speech to each segmented words, and convert the phonetic transcription into an orthographic transcription, which, in the case of Japanese, means the conversion from "hiragana" (phonogram) to "kanji" (ideogram). The morphological analyzer can handle pauses, interjections, restatements and chimings, all which are characteristics of spontaneous speech, by learning the parameters of the language model directly from the ¦phonetic transcription. The proposed morphological analyzer achieves 95.0% recall and 95.3% precision on closed text when it was trained and tested on a portion (containing 172,826 words) of the ATR Corpus, telephone dialogues in the conference registration domain.


Cite as: Nagata, M. (1994) A stochastic morphological analyzer for spontaneously spoken languages. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 795-798

@inproceedings{nagata94_icslp,
  author={Masaaki Nagata},
  title={{A stochastic morphological analyzer for spontaneously spoken languages}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={795--798}
}