Aligning letters and phonemes for speech synthesis

Robert I. Damper, Yannick Marchand, John-David Marseters, Alex Bazin

A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe use of a form of the expectation-maximization (EM) algorithm to achieve automatic alignment of English text and phonemes. The quality of alignment is assessed by the performance of a pronunciation by analogy system using the aligned dictionary data. We find excellent performance - the best so far reported in the literature of letter-phoneme conversion - independent of the start point for alignment, indicating that the EM search space is strongly convex.

Cite as: Damper, R.I., Marchand, Y., Marseters, J.-D., Bazin, A. (2004) Aligning letters and phonemes for speech synthesis. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 209-214

