Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe use of a form of the expectation-maximization (EM) algorithm to achieve automatic alignment of English text and phonemes. The quality of alignment is assessed by the performance of a pronunciation by analogy system using the aligned dictionary data. We find excellent performance - the best so far reported in the literature of letter-phoneme conversion - independent of the start point for alignment, indicating that the EM search space is strongly convex.
Bibliographic reference. Damper, Robert I. / Marchand, Yannick / Marseters, John-David / Bazin, Alex (2004): "Aligning letters and phonemes for speech synthesis", In SSW5-2004, 209-214.