ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Prosody boundary detection through context-dependent position models

Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang

In this paper, we propose to convert the prosody boundary detection task into a syllable position labeling task. In order to detect both prosodic word and prosodic phrase boundaries, 6 types of syllable positions are defined. For each position, contextdependent position models are trained from manually labeled data. These models are used to label syllable positions in unseen speech. Word and phrase boundaries are then easily derived from syllable position labels. The proposed approach is tested with a large scale single speaker database. The precision and recall for word boundary are 96.1% and 90.1%, respectively, and for phrase boundary are 83.7% and 80.5%, respectively. Results of a listening test shows that only 28% of word boundaries and 50% of phrase of boundaries detected automatically are critical error, implying only about 2.2% and 10% errors for word and phrase boundaries, respectively. The results are rather good, especially when it is considered that only acoustic features are used in this work.

doi: 10.21437/Interspeech.2008-555

Cite as: Hu, Y.-N., Chu, M., Huang, C., Zhang, Y.-N. (2008) Prosody boundary detection through context-dependent position models. Proc. Interspeech 2008, 2142-2145, doi: 10.21437/Interspeech.2008-555

  author={Yue-Ning Hu and Min Chu and Chao Huang and Yan-Ning Zhang},
  title={{Prosody boundary detection through context-dependent position models}},
  booktitle={Proc. Interspeech 2008},