The region-dependent transform (RDT) is a feature extraction method for speech recognition that employs the Minimum Phoneme Error (MPE) criterion to optimize a set of feature transforms, each concentrating on a region of the acoustic space. Previous results have shown that RDT gives significant recognition-error reduction in a large vocabulary speaker-independent (SI) system. As a follow-up investigation, this paper presents the recent progress of applying RDT in speaker-adaptive training (SAT). Similar to previous SI results, the integration of RDT with SAT yields 7% relative improvement in word error rate (WER). Also, theoretical comparisons are made between RDT and other discriminative feature extraction methods, including the improved version of the feature-space MPE (fMPE) that uses the "mean-offsets" as additional input features.
Cite as: Zhang, B., Matsoukas, S., Schwartz, R. (2006) Recent progress on the discriminative region-dependent transform for speech feature extraction. Proc. Interspeech 2006, paper 1573-Wed1A2O.5, doi: 10.21437/Interspeech.2006-427
@inproceedings{zhang06e_interspeech, author={Bing Zhang and Spyros Matsoukas and Richard Schwartz}, title={{Recent progress on the discriminative region-dependent transform for speech feature extraction}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1573-Wed1A2O.5}, doi={10.21437/Interspeech.2006-427} }