ISCA Workshop on Multilingual Speech and Language Processing (MULTILING 2006)

Center for Language and Speech Technology, Stellenbosch University, Stellenbosch, South Africa
April 9-11, 2006

Language-Dependent State Clustering for Multilingual Speech Recognition in Afrikaans, South African English, Xhosa and Zulu

Thomas Niesler

Department of Electrical and Electronic Engineering, University of Stellenbosch, South Africa

The development of automatic speech recognition systems requires significant quantities of annotated acoustic data. In South Africa, the large number of spoken languages hampers such data collection efforts. Furthermore, code switching and mixing are commonplace since most citizens speak two or more languages fluently. As a result a considerable degree of phonetic cross pollination between languages can be expected. We investigate whether it is possible to combine speech data from different languages in order to improve the performance of a speech recognition system in any one language. For our investigation we use recently collected Afrikaans, South African English, Xhosa and Zulu speech databases. We extend the decision-tree clustering process normally used to construct tiedstate hidden Markov models to allow the inclusion of languagespecific questions, and compare the performance of systems that allow sharing between languages with those that do not. We find that multilingual acoustic models obtained in this way show a small but consistent improvement over separate-language systems when applied to Afrikaans and English, and to Xhosa and Zulu. The improvement for the latter pair of languages is greater, which is consistent with their larger degree of phonetic similarity.

Full Paper

Bibliographic reference.  Niesler, Thomas (2006): "Language-dependent state clustering for multilingual speech recognition in Afrikaans, South African English, Xhosa and Zulu", In MULTILING-2006, paper 007.