ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Multilingual features based keyword search for very low-resource languages

Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney

In this paper we describe RWTH Aachen's system for keyword search (KWS) with very limited amount of transcribed audio data available in the target language. This setting has become this year's primary condition within the Babel project [1], seeking to minimize the amount of human effort while retaining a reasonable KWS performance. Thus the highlights presented in this paper include graphemic acoustic modeling; multilingual features trained on language data from the previous project periods; comparison of tandem and hybrid DNN-HMM acoustic models; processing of large amounts of text data available on the web and the morphological KWS based on automatically derived word fragments. The evaluation is performed using two training sets for each of the six current project period's languages — full language pack (FLP), consisting of 30 hours and very limited language pack (VLLP), comprising less than 3 hours of transcribed audio data. We put our focus on the latter of the two, which is clearly more challenging. The methods described in this work allowed us to exceed 0.3 MTWV on five out of six languages using development queries.

doi: 10.21437/Interspeech.2015-316

Cite as: Golik, P., Tüske, Z., Schlüter, R., Ney, H. (2015) Multilingual features based keyword search for very low-resource languages. Proc. Interspeech 2015, 1260-1264, doi: 10.21437/Interspeech.2015-316

  author={Pavel Golik and Zoltán Tüske and Ralf Schlüter and Hermann Ney},
  title={{Multilingual features based keyword search for very low-resource languages}},
  booktitle={Proc. Interspeech 2015},