INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

An Empirical Study of Multilingual and Low-Resource Spoken Term Detection Using Deep Neural Networks

Jie Li, Xiaorui Wang, Bo Xu

Chinese Academy of Sciences, China

As a further step of our previous work, this paper focuses on how to promote the multilingual spoken term detection (STD) system by the use of shared-hidden-layer multilingual DNN (SHL-MDNN). Seven languages namely Arabic, English, German, Japanese, Korean, Mandarin and Spanish are used in our experiments. Compared with our original multilingual STD system, which is based on Subspace GMMs (SGMMs), the resulting system reduces the average equal error rate (EER) on seven languages by 17.2%.
   Our STD system is also evaluated under low-resource conditions in this paper. We choose Mandarin and English as two target languages and simulate different degrees of available resources. The experimental results show that with the help of cross-lingual model transfer, our STD system can be elevated a lot in low-resource settings. To further improve the performance, we also attempt to use dropout strategy during the process of cross-lingual model transfer. However, no significant improvement can be observed in our experiments. This indicates the dropout method is not so effective on cross-lingual model transfer task.As a further step of our previous work, this paper focuses on how to promote the multilingual spoken term detection (STD) system by the use of shared-hidden-layer multilingual DNN (SHL-MDNN). Seven languages namely Arabic, English, German, Japanese, Korean, Mandarin and Spanish are used in our experiments. Compared with our original multilingual STD system, which is based on Subspace GMMs (SGMMs), the resulting system reduces the average equal error rate (EER) on seven languages by 17.2%.
   Our STD system is also evaluated under low-resource conditions in this paper. We choose Mandarin and English as two target languages and simulate different degrees of available resources. The experimental results show that with the help of cross-lingual model transfer, our STD system can be elevated a lot in low-resource settings. To further improve the performance, we also attempt to use dropout strategy during the process of cross-lingual model transfer. However, no significant improvement can be observed in our experiments. This indicates the dropout method is not so effective on cross-lingual model transfer task.

Full Paper

Bibliographic reference.  Li, Jie / Wang, Xiaorui / Xu, Bo (2014): "An empirical study of multilingual and low-resource spoken term detection using deep neural networks", In INTERSPEECH-2014, 1747-1751.