Two-Stage Data Augmentation for Low-Resourced Speech Recognition

William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz


Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.


DOI: 10.21437/Interspeech.2016-1386

Cite as

Hartmann, W., Ng, T., Hsiao, R., Tsakalidis, S., Schwartz, R. (2016) Two-Stage Data Augmentation for Low-Resourced Speech Recognition. Proc. Interspeech 2016, 2378-2382.

Bibtex
@inproceedings{Hartmann+2016,
author={William Hartmann and Tim Ng and Roger Hsiao and Stavros Tsakalidis and Richard Schwartz},
title={Two-Stage Data Augmentation for Low-Resourced Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1386},
url={http://dx.doi.org/10.21437/Interspeech.2016-1386},
pages={2378--2382}
}