Deep neural networks have recently shown great promise for language recognition. In particular, the expected counts of clustered context-dependent phone states (senones) can serve as a simple but effective phonotactic system. This paper introduces multinomial i-vectors applied to senone counts and shows that they work better than current PCA approaches. In addition, we show that a new approach using a standard normal prior and MAP multinomial i-vector estimation further improves performance, particularly for shorter test durations. Finally, we present a reduced-complexity version of Newton's method to greatly accelerate multinomial i-vector extraction. Experimental results on the NIST LRE11 task show that this approach performs significantly better than top-performing acoustic and phonotactic systems from that evaluation.
Bibliographic reference. McCree, Alan / Garcia-Romero, Daniel (2015): "DNN senone MAP multinomial i-vectors for phonotactic language recognition", In INTERSPEECH-2015, 394-397.