A common requirement in speech technology is to align two different symbolic representations of the same linguistic message. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe use of a form of the expectation-maximization (EM) algorithm to achieve automatic alignment of English text and phonemes. The quality of alignment is assessed by the performance of a pronunciation by analogy system using the aligned dictionary data. We find excellent performance - the best so far reported in the literature of letter-phoneme conversion - independent of the start point for alignment, indicating that the EM search space is strongly convex.
Cite as: Damper, R.I., Marchand, Y., Marseters, J.-D., Bazin, A. (2004) Aligning letters and phonemes for speech synthesis. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 209-214
@inproceedings{damper04_ssw, author={Robert I. Damper and Yannick Marchand and John-David Marseters and Alex Bazin}, title={{Aligning letters and phonemes for speech synthesis}}, year=2004, booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)}, pages={209--214} }