This paper studies a new way of constructing multiple phone tokenizers for language recognition. In this approach, each phone tokenizer for a target language will share a common set of acoustic models, while each tokenizer will have a unique phone-based language model (LM) trained for a specific target language. The target-aware language models (TALM) are constructed to capture the discriminative ability of individual phones for the desired target languages. The parallel phone tokenizers thus formed are shown to achieve better performance than the original phone recognizer. The proposed TALM is very different from the LM in the traditional PPRLM technique. First of all, the TALM applies the LM information in the front-end as opposed to PPRLM approach which uses a LM in the system back-end; Furthermore, the TALM exploits the discriminative phones occurrence statistics, which are different from the traditional n-gram statistics in PPRLM approach. A novel way of training TALM is also studied in this paper. Our experimental results show that the proposed method consistently improves the language recognition performance on NIST 1996, 2003 and 2007 LRE 30-second closed test sets.
Bibliographic reference. Tong, Rong / Ma, Bin / Li, Haizhou / Chng, Eng Siong / Lee, Kong-Aik (2009): "Target-aware language models for spoken language recognition", In INTERSPEECH-2009, 200-203.