INTERSPEECH 2006 - ICSLP
This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On the other hand, SVM is adopted to realize robust classification against a wide variety of expressions and speech recognition errors. Detection is performed by an SVM-based text chunker using lexical and pause information as features. We evaluated these approaches on manual and automatic transcription of spontaneous lectures and speeches, and achieved F-measures of 0.85 and 0.78, respectively.
Bibliographic reference. Akita, Yuya / Saikou, Masahiro / Nanjo, Hiroaki / Kawahara, Tatsuya (2006): "Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines", In INTERSPEECH-2006, paper 1370-Tue2A2O.4.