16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Automatic Phrase Boundary Labeling of Speech Synthesis Database Using Context-Dependent HMMs and N-Gram Prior Distributions

Qian Chen (1), Zhen-Hua Ling (1), Chen-Yu Yang (2), Li-Rong Dai (1)

(1) USTC, China
(2) A*STAR, Singapore

This paper presents an automatic phrase boundary labeling method for speech synthesis database annotation using context-dependent hidden Markov models (CD-HMMs) and n-gram prior distributions. At training stage, CD-HMMs are built to describe the conditional distribution of acoustic features given phonetic label and phrase boundary. In addition, n-gram models are estimated to represent the prior distributions of the phrase boundaries to be predicted. At decoding stage, the CD-HMMs and n-gram models are combined to predict the phrase boundaries by Viterbi decoding under maximum a posteriori (MAP) criterion. In our experiments, the proposed method utilizing context-dependent bigram prior distributions improved the F-score of phrase boundary labeling from 72.2% to 79.6% on the Boston University Radio News Corpus (BURNC), and from 69.6% to 81.0% on the Blizzard Challenge 2007 database respectively, comparing with the method using only acoustic models.

Full Paper

Bibliographic reference.  Chen, Qian / Ling, Zhen-Hua / Yang, Chen-Yu / Dai, Li-Rong (2015): "Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions", In INTERSPEECH-2015, 1581-1585.