14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Learning Speaker-Specific Pronunciations of Disordered Speech

H. Christensen, Phil D. Green, Thomas Hain

University of Sheffield, UK

One of the main clinical applications of speech technology is in voice-enabled assistive technology for people with disordered speech. Progress in this area is hampered by a sparseness in suitable data and recent research have focused on ways of incorporating knowledge about typical (i.e., un-impaired) speech through the use of e.g., deep belief neural networks. This paper presents a new way of using deep belief neural networks trained on typical speech, namely to improve pronunciations for individual speakers. Analysis of the posterior probabilities show a clear correlation between measured pronunciation edisorderedness' and the overall speech recognition performance of the full system. Based on this, we propose a method to use deep belief network outputs to i) identify which words are pronounced differently than what would be expected from a typical pronunciation, and ii) subsequently generate new pronunciations. We investigate different methods for pronunciation generation as well as what is the best way of using the modified pronunciations to inform the system development stages. Using the UAspeech database of disordered speech, we demonstrate improvement in average accuracy of 69.76% to 70.51%, with some speakers showing individual improvements of up to 10%.

Full Paper

Bibliographic reference.  Christensen, H. / Green, Phil D. / Hain, Thomas (2013): "Learning speaker-specific pronunciations of disordered speech", In INTERSPEECH-2013, 1159-1163.