Interspeech'2005 - Eurospeech
We propose a novel acoustic segment modeling approach to automatic language identification (LID). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic segment models (ASMs) without imposing any phonetic definitions. These segment models are used to decode spoken utterances into strings of segment units. The statistics of these units and their co-occurrences are used to form ASM-derived feature vectors to discriminate individual spoken languages. We evaluate the proposed approach on the 12-language, 1996 NIST Language Recognition Evaluation (LRE) task. With testing queries of about 30 seconds long, our results show that the proposed ASM framework reduces the LID error rate quite significantly when compared with the prevailing parallel PRLM method. We achieved an accuracy of 86.1% using a set of 128 3-state ASMs, with each state characterized by a mixture Gaussian density with 32 mixture components.
Bibliographic reference. Ma, Bin / Li, Haizhou / Lee, Chin-Hui (2005): "An acoustic segment modeling approach to automatic language identification", In INTERSPEECH-2005, 2829-2832.