Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models

Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel


In this paper we propose a solution that detects sentence boundary from speech transcript. First we train a pure lexical model with deep neural network, which takes word vectors as the only input feature. Then a simple acoustic model is also prepared. Because the models work independently, they can be trained with different data. In next step, the posterior probabilities of both lexical and acoustic models will be involved in a heuristic 2-stage joint decision scheme to classify the sentence boundary positions. This approach ensures that the models can be updated or switched freely in actual use. Evaluation on TED Talks shows that the proposed lexical model can achieve good results: 75.5% accuracy on error-involved ASR transcripts and 82.4% on error-free manual references. The joint decision scheme can further improve the accuracy by 3~10% when acoustic data is available.


DOI: 10.21437/Interspeech.2016-257

Cite as

Che, X., Luo, S., Yang, H., Meinel, C. (2016) Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models. Proc. Interspeech 2016, 2528-2532.

Bibtex
@inproceedings{Che+2016,
author={Xiaoyin Che and Sheng Luo and Haojin Yang and Christoph Meinel},
title={Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-257},
url={http://dx.doi.org/10.21437/Interspeech.2016-257},
pages={2528--2532}
}