International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Automatic Construction of English-Chinese Translation Lexicon from Parallel Spoken Language Corpus

Bo-Xing Chen, Li-Min Du

Center for Speech Interaction Technology Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

This paper described an algorithm for automatic construction of English-Chinese translation lexicon from sentence aligned parallel spoken language corpus. We get the first part of the translation lexicon by using the electronic dictionary to filter the corpus. Secondly, state and calculate the co-occurrence probability of the word pairs to produce "The Table of Chinese- English (English-Chinese) Words Co-occurrence Association Score" and "The Table of Chinese-English (English-Chinese) Words Co-occurrence Association Verifying Score". Then, for each word pairs in the four tables, we give 1 as the confidence score if the word pairís co-occurrence association score or cooccurrence association verify score is the top five for each source word. Then, use the confidence score as the criterion for constructing 4 levels translation lexicons. The "Filtered lexicon and the 4th level lexicon" get the precision of 93.389% and the recall of 93.5%. This is an inspiring result, because it is based on the Indo-European and the non-Indo-European spoken language corpus. In this algorithm, we synchronously use the mutual information and the association verifying score as the criterion for constructing translation lexicons. The grading of the lexicon can deduce the number of the incorrect entries in the high level lexicon effectively, which makes the translation lexicon more practicably. And it solves the problem of the balance of the precision and recall.

Full Paper

Bibliographic reference.  Chen, Bo-Xing / Du, Li-Min (2002): "Automatic construction of English-Chinese translation lexicon from parallel spoken language corpus", In ISCSLP 2002, paper 11.