We investigate the question of whether phone recognition models trained on large English databases can be used for speaker recognition in another language. Such a crosslanguage use of recognition models is an attractive option when a speaker recognition system is to be ported to a new language without the necessary data resources, while retaining some of the advantages of phone modeling and ASR-based feature extraction. We compare the performance of such systems to a baseline cepstral GMM system (which is inherently language independent), and to a phone-recognition-based system trained exclusively on Arabic data. Our results indicate that cross-language models are highly competitive, and, at least in our case, have a performance advantage over within-language training and the language-independent baseline. We also examine the effect of coverage of colloquial Arabic dialects in the training data.
Cite as: Stolcke, A., Kajarekar, S. (2008) Recognizing Arabic speakers with English phones. Proc. The Speaker and Language Recognition Workshop (Odyssey 2008), paper 24
@inproceedings{stolcke08_odyssey, author={Andreas Stolcke and Sachin Kajarekar}, title={{Recognizing Arabic speakers with English phones}}, year=2008, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2008)}, pages={paper 24} }