Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Aligning Letters and Phonemes for Speech Synthesis

Robert I. Damper (1,2), Yannick Marchand (2), John-David Marseters (1), Alex Bazin (1)

(1) Image, Speech and Intelligent Systems, University of Southampton, UK
(2) National Research Council Canada

A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe use of a form of the expectation-maximization (EM) algorithm to achieve automatic alignment of English text and phonemes. The quality of alignment is assessed by the performance of a pronunciation by analogy system using the aligned dictionary data. We find excellent performance - the best so far reported in the literature of letter-phoneme conversion - independent of the start point for alignment, indicating that the EM search space is strongly convex.

Full Paper

Bibliographic reference.  Damper, Robert I. / Marchand, Yannick / Marseters, John-David / Bazin, Alex (2004): "Aligning letters and phonemes for speech synthesis", In SSW5-2004, 209-214.