11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Fully Automatic Segmentation for Prosodic Speech Corpora

Sarah Hoffmann, Beat Pfister

ETH Zürich, Switzerland

While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the process or manual post-processing. This is very time-consuming and slows down porting of speech systems to new languages. In the context of prosody corpora for text-to-speech (TTS) systems, we investigated methods for fully automatic phoneme segmentation using only the corpora to be segmented and an automatically generated transcription. We present a new method that improves the performance of HMM-based segmentation by correcting the boundaries between the training stages of the phoneme models with high precision. We show that, while initially aimed at single speaker corpora, it performs equally well for multi-speaker corpora.

Full Paper

Bibliographic reference.  Hoffmann, Sarah / Pfister, Beat (2010): "Fully automatic segmentation for prosodic speech corpora", In INTERSPEECH-2010, 1389-1392.