Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Machine Learning of Word Pronunciation: The Case Against Abstraction

Bertjan Busser, Walter Daelemans, Antal van den Bosch

ILK / Computational Linguistics, Tilburg University, The Netherlands

Word pronunciation can be learned by inductive machine learning algorithms when it is represented as a classification task: classify a letter within its local word context as mapping to its pronunciation. On the basis of generalization accuracy results from empirical studies, we argue that word pronunciation, particularly in complex spelling systems such as that of English, should not be modelled in a way that abstracts from exceptions. Learning methods such as decision tree and backpropagation learning, while trying to abstract from noise, also throw away alarge number of useful exceptional cases. Our empirical results suggest that a memory-based approach which stores all available word-pronunciation knowledge as cases in memory, and generalises from this lexicon via analogical reasoning, is at all times the optimal modelling method.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Busser, Bertjan / Daelemans, Walter / Bosch, Antal van den (1999): "Machine learning of word pronunciation: the case against abstraction", In EUROSPEECH'99, 2123-2126.