ISCA Archive SLTU 2012
ISCA Archive SLTU 2012

Validating smartphone-collected speech corpora

Marelie H. Davel, Charl J. van Heerden, Etienne Barnard

We investigate the effectiveness with which the accuracy of a prompted speech corpus can be validated when minimal additional speech resources are available, and specifically when a language model in the target language is not available. We compare a word-based variant of Goodness of Pronunciation (GOP) with a phone-based dynamic programming (PDP) scoring technique. The first technique uses the acoustic likelihood ratio and the second the optimal alignment between an observed phone string (generated by a speech recogniser) and a reference phone string (obtained from a dictionary) to generate validation scores. We define a new technique to obtain a PDP scoring matrix in a data-driven fashion, examine different ways of using GOP for word scoring, and find that variants of both techniques provide results that are effective for corpus validation.

Index Terms: speech corpora, corpus validation, goodness of pronunciation, phone-based dynamic programming scores


Cite as: Davel, M.H., Heerden, C.J.v., Barnard, E. (2012) Validating smartphone-collected speech corpora. Proc. 3rd Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2012), 68-75

@inproceedings{davel12_sltu,
  author={Marelie H. Davel and Charl J. van Heerden and Etienne Barnard},
  title={{Validating smartphone-collected speech corpora}},
  year=2012,
  booktitle={Proc. 3rd Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2012)},
  pages={68--75}
}