INTERSPEECH 2004 - ICSLP
We investigate machine learning techniques for coping with highly skewed class distributions in two spontaneous speech processing tasks. Both tasks, sentence boundary and disfluency detection, provide important structural information for downstream language processing modules. We examine the effect of data set size, task, sampling method (no sampling, downsampling, oversampling, and ensemble sampling), and learning method (bagging, ensemble bagging, and boosting) for a decision tree prosody model.
Bibliographic reference. Liu, Yang / Shriberg, Elizabeth / Stolcke, Andreas / Harper, Mary (2004): "Using machine learning to cope with imbalanced classes in natural speech: evidence from sentence boundary and disfluency detection", In INTERSPEECH-2004, 1525-1528.