We investigate the problem of automatic detection of annotation errors in single-speaker read-speech corpora used for text-to-speech (TTS) synthesis. Various word-level feature sets were used, and the performance of several detection methods based on support vector machines, extremely randomized trees, k-nearest neighbors, and the performance of novelty and outlier detection are evaluated. We show that both word- and utterance-level annotation error detections perform very well with both high precision and recall scores and with F1 measure being almost 90%, or 97%, respectively.
Bibliographic reference. Matoušek, Jindřich / Tihelka, Daniel (2013): "Annotation errors detection in TTS corpora", In INTERSPEECH-2013, 1511-1515.