Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Phonetic Labeling and Segmentation of Mixed-Lingual Prosody Databases

Harald Romsdorfer, Beat Pfister

ETH Zurich, Switzerland

An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants.

Although the system is HMM-based, it uses only the speech signals of the prosody database which typically consists of a few hundred sentences with some 30 minutes total duration. Initial phone HMMs are generated with flat-start training using the canonical transcriptions of the sentences. Then iterative Viterbi search for best-matching pronunciation variants and HMM retraining is applied until convergence is attained.

Full Paper

Bibliographic reference.  Romsdorfer, Harald / Pfister, Beat (2005): "Phonetic labeling and segmentation of mixed-lingual prosody databases", In INTERSPEECH-2005, 3281-3284.