EUROSPEECH 2003 - INTERSPEECH 2003
The language of origin of a name affects its pronunciation, so language identification is an important technology for speech synthesis and recognition. Previous work on this task has typically used training sets that are proprietary or limited in coverage. In this work, we investigate the use of a publically-available geographic database for training language ID models. We automatically cluster place names by language, and show that models trained from place name data are effective for language ID on person names. In addition, we compare several source-channel and direct models for language ID, and achieve a 24% reduction in error rate over a source-channel letter trigram model on a 26-way language ID task.
Bibliographic reference. Chen, Stanley F. / Maison, Benoit (2003): "Using place name data to train language identification models", In EUROSPEECH-2003, 1349-1352.