High level features such as phone and word n-grams have been shown to be effective for speaker recognition, particularly when used along side traditional acoustic speaker recognition techniques. The applicability of these high-level recognition systems is impeded by the large training data requirements needed to build robust and stable speaker models. This paper describes an extension to an existing phone n-gram based speaker recognition technique, whereby MAP adaptation is used in the speaker model training process. Results obtained for the NIST 2003 Speaker Recognition Extended Data Task indicate that a significant improvement in performance can be gained through the use of this model estimation technique. In our tests, we were able to improve performance over the baseline system, and at the same time, halved the training data requirement. Further experimentation using MAP adaptation on word n-gram models also showed improvement over baseline results, suggesting that the technique could be applied to other multinomial distribution feature sets.
Cite as: Baker, B., Vogt, R., Mason, M., Sridharan, S. (2004) Improved phonetic and lexical speaker recognition through MAP adaptation. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 91-96
@inproceedings{baker04_odyssey, author={Brendan Baker and Robbie Vogt and Michael Mason and Sridha Sridharan}, title={{Improved phonetic and lexical speaker recognition through MAP adaptation}}, year=2004, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)}, pages={91--96} }