Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Multi-Lingual Testing of a Self-Learning Approach to Phonemic Transcription of Orthography

Ove Andersen, Paul Dalsgaard

Center for PersonKommunikation, Aalborg University, Denmark

A Self-Learning system for Grapheme to Phoneme conversion is described and tested. The system acquires the knowledge needed for grapheme-to-phoneme conversion from a training session in which a large number of pairs of grapheme strings and their corresponding (manually verified) phonemic transcription strings are presented to the system. The result from the training is a stochastic decision tree in which statistics - as given in the training material - about corresponding graphemes and phonemes are stored for later retrieval. The system is tested on a number of European languages and results from three tests are reported. In the first test, which concerns proper names, only the most probable phoneme candidate at each leaf of the tree is utilised. The second and the third test, both using a database of ordinary words, aims at analysing phoneme and word accuracies resulting from using N-Best phonemes at each leaf and from introducing phonotactic information, respectively. Using N-Best candidates in combination with phonotactic information show a phoneme and word accuracy of up to 88.5% and 46.6%, respectively.

