ISCA Archive SSW 2004
ISCA Archive SSW 2004

Improving pronunciation dictionary coverage of names by modelling spelling variation

Justin Fackrell, Wojciech Skut

This paper describes an attempt to improve the coverage of an existing name pronunciation dictionary by modelling variation in spelling. This is done by the derivation of string rewrite rules which operate on out-of-vocabulary words to map them to in-vocabulary words. These string rewrite rules are derived automatically, and are "pronunciation-neutral" in the sense that the mappings they perform on the existing dictionary do not result in a change of pronunciation. The approach is data-driven, and can be used online to make predictions for some (not all) OOV words, or offline to add significant numbers of new pronunciations to existing dictionaries. Offline the approach has been used to increase dictionary coverage for four domain-based dictionaries for forenames, surnames, streetnames and placenames. For surnames, a model trained on a 23,000-entry dictionary was subsequently able to add 5,000 new entries, improving both type coverage and token coverage of the dictionaries by about 1%. An informal evaluation suggests that the suggested pronunciations are good in 80% of cases.


Cite as: Fackrell, J., Skut, W. (2004) Improving pronunciation dictionary coverage of names by modelling spelling variation. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 121-126

@inproceedings{fackrell04_ssw,
  author={Justin Fackrell and Wojciech Skut},
  title={{Improving pronunciation dictionary coverage of names by modelling spelling variation}},
  year=2004,
  booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)},
  pages={121--126}
}