International Symposium on Chinese Spoken Language Processing
August 23-24, 2002
Spoken Language Identification Using Bigram
Xuelin Cheng, Kaizheng Wu, Han Wang, Zongge Li
Department of Computer Science,
Fudan University, Shanghai, China
The task of automatic language identification (ALI) system
is to distinguish the incoming utterances between different
languages. In this paper the decoding bigram and extended
bigrams of each language are exploited to interpret the
characteristics of languages. In the final system which
includes four languages, i.e. English, Mandarin, Japanese
and Spanish, the phone sequences that are outputted by
phone recognizers using viterbi algorithm over decoding
bigrams are fed into extended bigrams, and based on the
language scores the classifier makes a maximum decision.
At last the system combined with decoding bigrams and
extended bigrams shows an improvement of 21.2% over
that with null grammar, especially the high identification
rate of 96% between Mandarin and Spanish.
Cheng, Xuelin / Wu, Kaizheng / Wang, Han / Li, Zongge (2002):
"Spoken language identification using bigram",
In ISCSLP 2002, paper 98.