One of the most common approaches in language verification (LV) is the phonotactic language verification. Currently, LV performances for different languages under different environments and durations have to be compared experimentally and this can make it difficult to understand LV performances across corpora or durations. LV can be viewed as a special case of hypothesis testing such that Neyman-Pearson theorem and other information theoretic analysis are applicable. In this paper, we introduce a measure of phonotactic confusablity based on the phonotactic distribution, and make it possible to assess the difficulty of the verification problem analytically. We then propose a method of predicting LV performance. The effectiveness of the proposed approach is demonstrated on the NIST 2003 language recognition evaluation test set.
Bibliographic reference. Wong, Ka-keung / Siu, Man-hung / Mak, Brian (2007): "A model-based estimation of phonotactic language verification performance", In INTERSPEECH-2007, 186-189.