State-of-the-art Automatic Speech Recognition (ASR) models struggle to handle accented speech, particularly if the target accent is under-represented in the training data. The acoustic variations presented by an unfamiliar accent, render the ASR polyphone decision tree (PDT) and its associated Gaussian mixture models (GMM) misfit to the test data. In this paper, we improve on the previous work of adapting the polyphone decision tree, using a semi-continuous model based approach to address the problem of data sparsity. We extend the existing PDT to introduce additional states with shared parameters, corresponding to the new contextual variations identified in the adaptation data, while still robustly estimating the state based parameters on a small adaptation set. We conduct ASR experiments on Arabic and English accents and show that our technique performs better than Maximum A-Posteriori (MAP) adaptation and a previous implementation of polyphone decision tree specialization (PDTS). Compared to MAP adaptation, we obtain 7% relative improvement for Dialectal Arabic and 13.8% relative improvement for Accented English.
Index Terms: automatic speech recognition, accent adaptation
Bibliographic reference. Nallasamy, Udhyakumar / Metze, Florian / Schultz, Tanja (2012): "Enhanced polyphone decision tree adaptation for accented speech recognition", In INTERSPEECH-2012, 1902-1905.