14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Combining In-Domain and Out-of-Domain Speech Data for Automatic Recognition of Disordered Speech

H. Christensen (1), M. B. Aniol (2), Peter Bell (2), Phil D. Green (1), Thomas Hain (1), Simon King (2), Pawel Swietojanski (2)

(1) University of Sheffield, UK
(2) University of Edinburgh, UK

Recently there has been increasing interest in ways of using out-ofdomain (OOD) data to improve automatic speech recognition performance in domains where only limited data is available. This paper focuses on one such domain, namely that of disordered speech for which only very small databases exist, but where normal speech can be considered OOD. Standard approaches for handling small data domains use adaptation from OOD models into the target domain, but here we investigate an alternative approach with its focus on the feature extraction stage: OOD data is used to train featuregenerating deep belief neural networks. Using AMI meeting and TED talk datasets, we investigate various tandem-based speaker independent systems as well as maximum a posteriori adapted speaker dependent systems. Results on the UAspeech isolated word task of disordered speech are very promising with our overall best system (using a combination of AMI and TED data) giving a correctness of 62.5%; an increase of 15% on previously best published results based on conventional model adaptation. We show that the relative benefit of using OOD data varies considerably from speaker to speaker and is only loosely correlated with the severity of a speaker's impairments.

Full Paper

Bibliographic reference.  Christensen, H. / Aniol, M. B. / Bell, Peter / Green, Phil D. / Hain, Thomas / King, Simon / Swietojanski, Pawel (2013): "Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech", In INTERSPEECH-2013, 3642-3645.