12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

An Empirical Study of Multilingual Spoken Term Detection

Zejun Ma, Xiaorui Wang, Bo Xu

Chinese Academy of Sciences, China

This paper introduces the design of multilingual spoken term detection (STD) system using CALLHOME and CALLFRIEND multilingual databases published by Linguistic Data Consortium. For our experiments seven languages namely Arabic, English, German, Japanese, Korean, Chinese Mandarin and Spanish, are used to train and evaluate the STD system.

As the core module of our language general STD system, the multilingual automatic speech recogniser combines the acoustic and language models of seven languages into an uniform model set. A lot of our works are focused on the comparison of multilingual acoustic models . the conventional global phoneme set (GPS) based method and the recently proposed subspace GMM (SGMM) method [1] are investigated in detail. The experimental results demonstrate the viability of our multilingual STD system. It is shown that the resulting multilingual system not only supports seven different languages but also gives satisfying performance gains over the monolingual systems.


  1. D. Povey, L. Burget, M. Agarwal, P. Akyazi, et al., “Subspace Gaussian Mixture Models for Speech Recognition,” in Proc. ICASSP’10, Dallas, Texas, USA, March 2010, pp. 4330-4333.

Full Paper

Bibliographic reference.  Ma, Zejun / Wang, Xiaorui / Xu, Bo (2011): "An empirical study of multilingual spoken term detection", In INTERSPEECH-2011, 1921-1924.