Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

A Self-Learning Approach to Transcription of Danish Proper Names

Ove Andersen, Paul Dalsgaard

Center for PersonKommunikation, Aalborg University, Denmark

This paper addresses the development of a Self-Learning system for Grapheme to Phoneme conversion, dubbed SELEGRAPH, which is applied for transcribing ordinary words and proper names. The system learns the conversion from graphemes-to-phonemes during a training session in which a large number of pairs of grapheme strings and their corresponding manually verified phonemic transcription strings are presented to the system. The two main components of the SELEGRAPH software system are the Viterbi module and the grapheme-to-phoneme conversion module. During training the Viterbi module is used to align corresponding strings of grapheme and phoneme pairs by inserting "nulls" into the strings. The information given by the set of aligned pairs is stored in a tree structure during training of the conversion module. An evaluation is carried out using three independent databases, one English and two Danish, containing ordinary words as well as proper names. The use of these databases allows for conclusions to be drawn on testing the relative complexity of transcribing ordinary English and Danish words and selected categories of Danish proper names. The best phoneme transcription results obtained are 87.5% for the NETtalk data, 94,9% for Ordinary Danish and 92.0% for Danish family names.

Full Paper

Bibliographic reference.  Andersen, Ove / Dalsgaard, Paul (1994): "A self-learning approach to transcription of danish proper names", In ICSLP-1994, 1627-1630.