ISCA Archive SSW 1998
ISCA Archive SSW 1998

Comparative evaluation of letter-to-sound conversion techniques for English text-to-speech synthesis

Robert I. Damper, Y. Marchand, M. J. Adamson, Kjell Gustafson

Dictionary look-up is the primary strategy for deriving pronunciations for input words in a text-to-speech (TTS) system. This strategy is accurate for dictionary words, but it is not complete: it is impossible to list exhaustively all input words. The proper treatment of 'unknown' words is currently an unsolved problem in TTS synthesis. There are many competing techniques for letter-to-sound conversion and the system developer must make a rational selection among them. However, it is unclear how di erent techniques should be properly compared. In this paper, we re- port a comparative assessment of the competitor methods of letter-to-sound rules, pronunciation by analogy, feedforward neural networks and a k-nearest neighbour method, with respect to their success at automatic phonemisation. This is achieved by using standardised scoring methods, test lexicon and phoneme inventories. The problem of standardising the phoneme set ('harmonisation') is deceptive: this is much harder than at first appears. The principal finding is that (contrary to the weight of opinion expressed in the literature) data-driven techniques outperform knowledge-based methods by a very significant margin.


Cite as: Damper, R.I., Marchand, Y., Adamson, M.J., Gustafson, K. (1998) Comparative evaluation of letter-to-sound conversion techniques for English text-to-speech synthesis. Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 53-58

@inproceedings{damper98_ssw,
  author={Robert I. Damper and Y. Marchand and M. J. Adamson and Kjell Gustafson},
  title={{Comparative evaluation of letter-to-sound conversion techniques for English text-to-speech synthesis}},
  year=1998,
  booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)},
  pages={53--58}
}