Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery

Marzieh Razavi, Mathew Magimai-Doss

Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on language-independent data; and then training a lexical model that captures grapheme-to-multilingual phone relationships on the target language data. In this paper, we show that such an approach can be employed to discover a latent space of subword units for under-resourced languages, and subsequently improve the performance of the ASR system through both acoustic and lexical model adaptation. Specifically, we present two approaches to discover the latent space: (1) inference of a subset of the multilingual phone set based on the learned grapheme-to-multilingual phone relationships, and (2) derivation of automatic subword unit space based on clustering of the grapheme-to-multilingual phone relationships. Experimental studies on Scottish Gaelic, a truly under-resourced language, show that both approaches lead to significant performance improvements, with the latter approach yielding the best system.

DOI: 10.21437/Interspeech.2016-1010

Cite as

Razavi, M., Magimai-Doss, M. (2016) Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery. Proc. Interspeech 2016, 3873-3877.

author={Marzieh Razavi and Mathew Magimai-Doss},
title={Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery},
booktitle={Interspeech 2016},