EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

DTW-Based Phonetic Alignment Using Multiple Acoustic Features

Sergio Paulo, Luis C. Oliveira

INESC-ID/IST, Portugal

This paper presents the results of our effort in improving the accuracy of a DTW-based automatic phonetic aligner. The adopted model assumes that the phonetic segment sequence is already known and so the goal is only to align the spoken utterance with a reference synthetic signal produced by waveform concatenation without prosodic modifications. Instead of using a single acoustic measure to compute the alignment cost function, our strategy uses a combination of acoustic features depending on the pair of phonetic segment classes being aligned. The results show that this strategy considerably reduces the segment boundary location errors, even when aligning synthetic and natural speech signals of different gender speakers.

Full Paper

Bibliographic reference.  Paulo, Sergio / Oliveira, Luis C. (2003): "DTW-based phonetic alignment using multiple acoustic features", In EUROSPEECH-2003, 309-312.