Word pronunciation can be learned by inductive machine learning algorithms when it is represented as a classification task: classify a letter within its local word context as mapping to its pronunciation. On the basis of generalization accuracy results from empirical studies, we argue that word pronunciation, particularly in complex spelling systems such as that of English, should not be modelled in a way that abstracts from exceptions. Learning methods such as decision tree and backpropagation learning, while trying to abstract from noise, also throw away alarge number of useful exceptional cases. Our empirical results suggest that a memory-based approach which stores all available word-pronunciation knowledge as cases in memory, and generalises from this lexicon via analogical reasoning, is at all times the optimal modelling method.
Cite as: Busser, B., Daelemans, W., Bosch, A.v.d. (1999) Machine learning of word pronunciation: the case against abstraction. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2123-2126, doi: 10.21437/Eurospeech.1999-472
@inproceedings{busser99_eurospeech, author={Bertjan Busser and Walter Daelemans and Antal van den Bosch}, title={{Machine learning of word pronunciation: the case against abstraction}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2123--2126}, doi={10.21437/Eurospeech.1999-472} }