Third Workshop on Spoken Language Technologies for Under-resourced Languages

Cape Town, South Africa
May 7-9, 2012

Boosting Under-Resourced Speech Recognizers by Exploiting Out-Of-Language Data - Case Study on Afrikaans

David Imseng (1,2), Hervé Bourlard (1,2), Philip N. Garner (1)

(1) Idiap Research Institute, Martigny, Switzerland
(2) Ecole Polytechnique Fédérale, Lausanne (EPFL), Switzerland

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we boost the performance of an Afrikaans speech recognizer by using already available data from other languages. To successfully exploit available multilingual resources, we use posterior features, estimated by multilayer perceptrons that are trained on similar languages. For two different acoustic modeling techniques, Tandem and Kullback-Leibler divergence based HMMs, the proposed multilingual system yields more than 10% relative improvement compared to the corresponding monolingual systems only trained on Afrikaans.

Index Terms: Multilingual speech recognition, posterior features, under-resourced languages, Afrikaans

Full Paper

Bibliographic reference.  Imseng, David / Bourlard, Hervé / Garner, Philip N. (2012): "Boosting under-resourced speech recognizers by exploiting out-of-language data - case study on Afrikaans", In SLTU-2012, 60-67.