BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End

Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li


In this paper, we propose a language-independent end-to-end architecture for prosodic boundary prediction based on BLSTM-CRF. The proposed architecture has three components, word embedding layer, BLSTM layer and CRF layer. The word embedding layer is employed to learn the task-specific embeddings for prosodic boundary prediction. The BLSTM layer can efficiently use both past and future input features, while the CRF layer can efficiently use sentence level information. We integrate these three components and learn the whole process end-to-end. In addition, we investigate both character-level embeddings and context sensitive embeddings to this model and employ an attention mechanism for combining alternative word-level embeddings. By using an attention mechanism, the model is able to decide how much information to use from each level of embeddings. Objective evaluation results show the proposed BLSTM-CRF architecture achieves the best results on both Mandarin and English datasets, with an absolute improvement of 3.21% and 3.74% in F1 score, respectively, for intonational phrase prediction, compared to previous state-of-the-art method (BLSTM). The subjective evaluation results further indicate the effectiveness of the proposed methods.


 DOI: 10.21437/Interspeech.2018-1472

Cite as: Zheng, Y., Tao, J., Wen, Z., Li, Y. (2018) BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End. Proc. Interspeech 2018, 47-51, DOI: 10.21437/Interspeech.2018-1472.


@inproceedings{Zheng2018,
  author={Yibin Zheng and Jianhua Tao and Zhengqi Wen and Ya Li},
  title={BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={47--51},
  doi={10.21437/Interspeech.2018-1472},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1472}
}