Third Workshop on Spoken Language Technologies for Under-resourced Languages
Cape Town, South Africa
We investigate the effectiveness with which the accuracy of a prompted speech corpus can be validated when minimal additional speech resources are available, and specifically when a language model in the target language is not available. We compare a word-based variant of Goodness of Pronunciation (GOP) with a phone-based dynamic programming (PDP) scoring technique. The first technique uses the acoustic likelihood ratio and the second the optimal alignment between an observed phone string (generated by a speech recogniser) and a reference phone string (obtained from a dictionary) to generate validation scores. We define a new technique to obtain a PDP scoring matrix in a data-driven fashion, examine different ways of using GOP for word scoring, and find that variants of both techniques provide results that are effective for corpus validation.
Index Terms: speech corpora, corpus validation, goodness of pronunciation, phone-based dynamic programming scores
Bibliographic reference. Davel, Marelie H. / Heerden, Charl J. van / Barnard, Etienne (2012): "Validating smartphone-collected speech corpora", In SLTU-2012, 68-75.