Interspeech'2005 - Eurospeech
We present a bigram-based method for deriving bi-lingual dictionary entries from two corpora of spontaneous speech (as represented in transcriptions). In contrast to e.g. , our method does not require translated or otherwise aligned texts; the corpora representing the source and target languages may be unrelated wrt. size, vocabulary richness, frequency distribution, and activity type. Examples are given using Danish and Swedish transcription data (and hints of English). We conclude with a discussion of the use of corpus-driven methods in language preservation and literation projects.
Bibliographic reference. Henrichsen, Peter Juel (2005): "Deriving a bi-lingual dictionary from raw transcription data", In INTERSPEECH-2005, 2229-2232.