In this paper, we investigate the performance of segment-based detectors for three taxonomic sets of acoustic-phonetic classes. Acoustic-phonetic detectors form an important processing layer for speech event decoding in the new detection-based automatic speech recognition. In this study, detectors are trained within a minimum verification error (MVE) framework which is markedly different from the conventional maximum likelihood (ML) method. Performance evaluations are conducted upon the TIMIT database by comparing detectors trained via MVE and detectors trained via maximum likelihood. Remarkable improvement in terms of detection error reduction is observed and reported. The result is a solid manifestation of the effectiveness of the discriminative training method, particularly MVE, in the detection-based speech recognition approach. These detectors, aside from being an important processing stage in an overall speech recognition system, can also be extended for applications in diagnostic information retrieval or recognition rescoring for utterance verification.
Cite as: Fu, Q., Juang, B.-H. (2005) Segment-based phonetic class detection using minimum verification error (MVE) training. Proc. Interspeech 2005, 3029-3032, doi: 10.21437/Interspeech.2005-146
@inproceedings{fu05_interspeech, author={Qiang Fu and Biing-Hwang Juang}, title={{Segment-based phonetic class detection using minimum verification error (MVE) training}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3029--3032}, doi={10.21437/Interspeech.2005-146} }