Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Sentence Boundary Detection of Spontaneous Japanese Using Statistical Language Model and Support Vector Machines

Yuya Akita (1), Masahiro Saikou (1), Hiroaki Nanjo (2), Tatsuya Kawahara (1)

(1) Kyoto University, Japan; (2) Ryukoku University, Japan

This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On the other hand, SVM is adopted to realize robust classification against a wide variety of expressions and speech recognition errors. Detection is performed by an SVM-based text chunker using lexical and pause information as features. We evaluated these approaches on manual and automatic transcription of spontaneous lectures and speeches, and achieved F-measures of 0.85 and 0.78, respectively.

Full Paper

Bibliographic reference.  Akita, Yuya / Saikou, Masahiro / Nanjo, Hiroaki / Kawahara, Tatsuya (2006): "Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines", In INTERSPEECH-2006, paper 1370-Tue2A2O.4.