In this work, Region Dependent Transform (RDT) is used as a feature extraction process to combine the traditional short-term acoustic features with the features derived from Multi-Layer Perceptrons (MLP) which is trained from the long-term features. When compared to the conventional feature augmentation approach, substantial improvement is obtained. Moreover, an improved RDT training procedure in which speaker dependent transforms are take into account is proposed for feature combination in the Speaker Adaptive Training. By incorporating the higher dimensional features output from the layer prior to the bottleneck layer into our Speech-to-Text (STT) system using RDT, significant improvement is achieved as compared to using the conventional bottleneck features. In summary, by using the features derived from MLP with RDT, 8.2% to 11.4% relative reduction in Character Error Rate is achieved for our Mandarin STT systems.
Bibliographic reference. Ng, Tim / Zhang, Bing / Matsoukas, Spyros / Nguyen, Long (2011): "Region dependent transform on MLP features for speech recognition", In INTERSPEECH-2011, 221-224.