In this paper we present an approach to modelling pronunciation variation, particularly for non-native speakers, by modifying the lexicon. In this way we can model several speakers simultaneously, i.e. use the same lexicon and the same acoustic models for all speakers. We use a data-driven approach, i.e. methods based solely on the reference lexicon, the recognizers acoustic models, and the acoustic data.
We propose a new alignment procedure using an estimated relation measure between the phones in the reference transcription and in the alternative transcription of the new speaker data. This measure discovers statistically significant correspondence between the phones in the two transcriptions. We present this measure as association strength. Rules are extracted from the alignment and used to derive pronunciation variants. Following rule pruning based on estimated probability of rules, the most beneficial rules are used to make a common lexicon.
Experiments using the new alignment algorithm on the Wall Street Journal non-native speaker database gave pronunciation rules that performed favourably in comparison to other alignment methods.
Cite as: Amdal, I., Korkmazskiy, F., Surendran, A.C. (2000) Data-driven pronunciation modelling for non-native speakers using association strength between phones. Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium, 85-90
@inproceedings{amdal00_asr, author={Ingunn Amdal and Filipp Korkmazskiy and Arun C. Surendran}, title={{Data-driven pronunciation modelling for non-native speakers using association strength between phones}}, year=2000, booktitle={Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium}, pages={85--90} }