In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) and Conditional Random Fields (CRFs), can be used to detect the boundaries. With these methods, the sentence boundaries are determined based on local sentence-end-like word sequences around the boundaries. However, the methods do not evaluate whether or not each block determined by the boundaries is appropriate as a sentence. We have proposed sequential dependency analysis (SDA), which extracts the dependency structure of unsegmented word sequences with a subsidiary mechanism of sentence boundary detection. In this paper, we extend SDA by combining it with CRFs to reflect both the properties of local word sequences and the appropriateness as a sentence. In this way we achieve more accurate sentence boundary detection. The experimental result shows that our proposed method provides better detection accuracy than that obtained with SVMs or CRFs alone. Our method can also work sequentially because it is based on the SDA framework and can be used for on-line spoken applications.
Cite as: Oba, T., Hori, T., Nakamura, A. (2006) Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking. Proc. Interspeech 2006, paper 1657-Tue2CaP.2, doi: 10.21437/Interspeech.2006-351
@inproceedings{oba06_interspeech, author={Takanobu Oba and Takaaki Hori and Atsushi Nakamura}, title={{Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1657-Tue2CaP.2}, doi={10.21437/Interspeech.2006-351} }