ISCA Archive SSW 2004
ISCA Archive SSW 2004

Forced alignment for speech synthesis databases using duration and prosodic phrase breaks

Arthur R. Toth

Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break probabilities for each word. This last step is performed by a HMM whose parameters are determined in a novel way. The results of classifying word boundaries on a wellpublicized database [1] were 65.7% accuracy on actual breaks and 92.2% overall.


Cite as: Toth, A.R. (2004) Forced alignment for speech synthesis databases using duration and prosodic phrase breaks. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 225-226

@inproceedings{toth04_ssw,
  author={Arthur R. Toth},
  title={{Forced alignment for speech synthesis databases using duration and prosodic phrase breaks}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={225--226}
}