ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Phonetic labeling and segmentation of mixed-lingual prosody databases

Harald Romsdorfer, Beat Pfister

An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants.

Although the system is HMM-based, it uses only the speech signals of the prosody database which typically consists of a few hundred sentences with some 30 minutes total duration. Initial phone HMMs are generated with flat-start training using the canonical transcriptions of the sentences. Then iterative Viterbi search for best-matching pronunciation variants and HMM retraining is applied until convergence is attained.

doi: 10.21437/Interspeech.2005-572

Cite as: Romsdorfer, H., Pfister, B. (2005) Phonetic labeling and segmentation of mixed-lingual prosody databases. Proc. Interspeech 2005, 3281-3284, doi: 10.21437/Interspeech.2005-572

  author={Harald Romsdorfer and Beat Pfister},
  title={{Phonetic labeling and segmentation of mixed-lingual prosody databases}},
  booktitle={Proc. Interspeech 2005},