The Sheffield Wargame Corpus — Day Two and Day Three

Yulan Liu, Charles Fox, Madina Hasan, Thomas Hain


Improving the performance of distant speech recognition is of considerable current interest, driven by a desire to bring speech recognition into people’s homes. Standard approaches to this task aim to enhance the signal prior to recognition, typically using beamforming techniques on multiple channels. Only few real-world recordings are available that allow experimentation with such techniques. This has become even more pertinent with recent works with deep neural networks aiming to learn beamforming from data. Such approaches require large multi-channel training sets, ideally with location annotation for moving speakers, which is scarce in existing corpora. This paper presents a freely available and new extended corpus of English speech recordings in a natural setting, with moving speakers. The data is recorded with diverse microphone arrays, and uniquely, with ground truth location tracking. It extends the 8.0 hour Sheffield Wargames Corpus released in Interspeech 2013, with a further 16.6 hours of fully annotated data, including 6.1 hours of female speech to improve gender bias. Additional blog-based language model data is provided alongside, as well as a Kaldi baseline system. Results are reported with a standard Kaldi configuration, and a baseline meeting recognition system.


DOI: 10.21437/Interspeech.2016-98

Cite as

Liu, Y., Fox, C., Hasan, M., Hain, T. (2016) The Sheffield Wargame Corpus — Day Two and Day Three. Proc. Interspeech 2016, 3833-3837.

Bibtex
@inproceedings{Liu+2016,
author={Yulan Liu and Charles Fox and Madina Hasan and Thomas Hain},
title={The Sheffield Wargame Corpus — Day Two and Day Three},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-98},
url={http://dx.doi.org/10.21437/Interspeech.2016-98},
pages={3833--3837}
}