In this paper, we introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional ASR research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities (especially in academia), in order to facilitate the development of novel acoustic modeling techniques on smaller but acoustically rich corpora. We select these corpora to maximize an acoustic quality criterion while limiting the vocabulary size (from 10 words up to 10,000 words) with different state-of-the-art submodular function optimization algorithms. We provide baseline word recognition results for both GMM and DNN-based systems and release the corpora definitions and Kaldi training recipes to the public.
Bibliographic reference. Liu, Yuzong / Iyer, Rishabh / Kirchhoff, Katrin / Bilmes, Jeff (2015): "SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech", In INTERSPEECH-2015, 673-677.