International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Spoken Language Identification Using Bigram

Xuelin Cheng, Kaizheng Wu, Han Wang, Zongge Li

Department of Computer Science, Fudan University, Shanghai, China

The task of automatic language identification (ALI) system is to distinguish the incoming utterances between different languages. In this paper the decoding bigram and extended bigrams of each language are exploited to interpret the characteristics of languages. In the final system which includes four languages, i.e. English, Mandarin, Japanese and Spanish, the phone sequences that are outputted by phone recognizers using viterbi algorithm over decoding bigrams are fed into extended bigrams, and based on the language scores the classifier makes a maximum decision. At last the system combined with decoding bigrams and extended bigrams shows an improvement of 21.2% over that with null grammar, especially the high identification rate of 96% between Mandarin and Spanish.

Full Paper

Bibliographic reference.  Cheng, Xuelin / Wu, Kaizheng / Wang, Han / Li, Zongge (2002): "Spoken language identification using bigram", In ISCSLP 2002, paper 98.