Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Support Vector Machines for Automatic Data Cleanup

Aravind Ganapathiraju, Joseph Picone

Institute for Signal and Information Processing, Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, MS, USA

Accurate training data plays a very important role in training effective acoustic models for speech recognition. In conversational speech, in several cases, the transcribed data has a significant word error rate which leads to bad acoustic models. In this paper we explore a method to automatically identify such mislabelled data in the context of a hybrid Support Vector Machine/hidden Markov model (HMM) system, thereby building accurate acoustic models. The effectiveness of this method is proven on both synthetic and real speech data. A hybrid system for OGI alphadigits using this methodology gives a significant improvement in performance over a comparable baseline HMM system.


Full Paper

Bibliographic reference.  Ganapathiraju, Aravind / Picone, Joseph (2000): "Support vector machines for automatic data cleanup", In ICSLP-2000, vol.4, 210-213.