Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka

Amit Das, Preethi Jyothi, Mark Hasegawa-Johnson


In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.


DOI: 10.21437/Interspeech.2016-657

Cite as

Das, A., Jyothi, P., Hasegawa-Johnson, M. (2016) Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka. Proc. Interspeech 2016, 3524-3528.

Bibtex
@inproceedings{Das+2016,
author={Amit Das and Preethi Jyothi and Mark Hasegawa-Johnson},
title={Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-657},
url={http://dx.doi.org/10.21437/Interspeech.2016-657},
pages={3524--3528}
}