In the spoken language recognition approach of modeling phonetic lattice with the Support Vector Machine (SVM), term weighting on the supervector of N-gram probabilities is critical to the recognition performance because the weighting prevents the SVM kernel from being dominated by a few large probabilities. We investigate several term weighting functions that are used in text retrieval, which can incorporate the long-term semantic modeling in the short-term N-gram modeling. The functions are evaluated on the NIST 2007 Language Recognition Evaluation (LRE) task. Results suggest the term weighting with redundancy of term frequency (rd) which eliminates the redundancy of unit frequency co-occurrence across languages, and the combination of rd and logtf which demonstrates the effectiveness of combining the local and global weighting functions.
Bibliographic reference. Boonsuk, Sirinoot / Zhu, Donglai / Ma, Bin / Suchato, Atiwong / Punyabukkana, Proadpran / Thatphithakkul, Nattanun / Wutiwiwatchai, Chai (2010): "A study of term weighting in phonotactic approach to spoken language recognition", In INTERSPEECH-2010, 2714-2717.