The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu

Ewald van der Westhuizen, Thomas Niesler


We consider the phenomenon of postlexical deletion in fast spontaneously spoken isiZulu speech and its implication for automatic speech recognition (ASR). Analysis of hand-crafted transcripts of fast spontaneous speech recorded from broadcast media indicates that postlexical deletion, especially of vowels, is common in isiZulu. We show that ASR performance can be increased by inclusion of pronunciation variants that model such deletions. We also apply a sequence modelling approach normally used for grapheme-to-phoneme (G2P) conversion to generate orthography containing synthetic deletions. These synthetically generated contacted words are subsequently used to generate accompanying pronunciations using conventional G2P conversion. We evaluate an ASR system using these synthetically generated pronunciations, and compare it to a baseline system without such variants as well as an oracle system. Augmentation with synthetically generated pronunciations leads to an absolute improvement in word error rate (WER) of 2.36% relative to the baseline. Furthermore, the augmented system performs almost as well as the oracle system, with an absolute difference in WER of 0.38%.


DOI: 10.21437/Interspeech.2016-820

Cite as

Westhuizen, E.v.d., Niesler, T. (2016) The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu. Proc. Interspeech 2016, 3559-3563.

Bibtex
@inproceedings{Westhuizen+2016,
author={Ewald van der Westhuizen and Thomas Niesler},
title={The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-820},
url={http://dx.doi.org/10.21437/Interspeech.2016-820},
pages={3559--3563}
}