ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Support vector machines for automatic data cleanup

Aravind Ganapathiraju, Joseph Picone

Accurate training data plays a very important role in training effective acoustic models for speech recognition. In conversational speech, in several cases, the transcribed data has a significant word error rate which leads to bad acoustic models. In this paper we explore a method to automatically identify such mislabelled data in the context of a hybrid Support Vector Machine/hidden Markov model (HMM) system, thereby building accurate acoustic models. The effectiveness of this method is proven on both synthetic and real speech data. A hybrid system for OGI alphadigits using this methodology gives a significant improvement in performance over a comparable baseline HMM system.


Cite as: Ganapathiraju, A., Picone, J. (2000) Support vector machines for automatic data cleanup. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 210-213

@inproceedings{ganapathiraju00_icslp,
  author={Aravind Ganapathiraju and Joseph Picone},
  title={{Support vector machines for automatic data cleanup}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 210-213}
}