In this paper, we propose an acoustic model training technique which is robust against outliers such as clipping, unexpected noise, poorly pronounced word segments, or mis-transcriptions, which deteriorate the quality of the acoustic models and in turn decrease speech recognition performance. The outlier-robust acoustic model training technique is based on a maximum likelihood (ML) criterion and automatically detects and removes outliers from the training data. Experiments with artificially contaminated mis-transcribed training data show that nearly the same word error rate can be obtained for contaminated data using the proposed technique as for uncontaminated data. Application to a dialogue speech database with unknown outliers reduces the errors by 4.03%.
Cite as: Matsuda, S., Herbordt, W., Nakamura, S. (2005) Outlier detection for acoustic model training using robust statistics. Proc. Interspeech 2005, 3337-3340, doi: 10.21437/Interspeech.2005-857
@inproceedings{matsuda05_interspeech, author={Shigeki Matsuda and Wolfgang Herbordt and Satoshi Nakamura}, title={{Outlier detection for acoustic model training using robust statistics}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3337--3340}, doi={10.21437/Interspeech.2005-857} }