15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

A Deep Neural Network Approach for Sentence Boundary Detection in Broadcast News

Chenglin Xu (1), Lei Xie (1), Guangpu Huang (2), Xiong Xiao (2), Eng Siong Chng (2), Haizhou Li (2)

(1) Northwestern Polytechnical University, China
(2) TL@NTU, Singapore

This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach, the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.

Full Paper

Bibliographic reference.  Xu, Chenglin / Xie, Lei / Huang, Guangpu / Xiao, Xiong / Chng, Eng Siong / Li, Haizhou (2014): "A deep neural network approach for sentence boundary detection in broadcast news", In INTERSPEECH-2014, 2887-2891.