15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting

Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath

University of Cambridge, UK

Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models (AM) are then trained. This work considers a particular scenario where the target language is unseen in multi-language training and has limited language model training data, a limited lexicon, and acoustic training data without transcriptions. A zero acoustic resources case is first described where a multi-language AM is directly applied, as a language independent AM (LIAM), to an unseen language. Secondly, in an unsupervised approach a LIAM is used to obtain hypotheses for the target language acoustic data transcriptions which are then used in training a language dependent AM. 3 languages from the IARPA Babel project are used for assessment: Vietnamese, Haitian Creole and Bengali. Performance of the zero acoustic resources system is found to be poor, with keyword spotting at best 60% of language dependent performance. Unsupervised language dependent training yields performance gains. For one language (Haitian Creole) the Babel target is achieved on the in-vocabulary data.

Full Paper

Bibliographic reference.  Knill, Kate M. / Gales, Mark J. F. / Ragni, Anton / Rath, Shakti P. (2014): "Language independent and unsupervised acoustic models for speech recognition and keyword spotting", In INTERSPEECH-2014, 16-20.