This paper describes the current improvements of the RWTH Mandarin LVCSR system. We introduce vocal tract length normalization for the Gammatone features and present comparable results for Gammatone based feature extraction and classical feature extraction. In order to benefit from the huge amount of data of 1600h available in the GALE project we have trained the acoustic models up to 8M Gaussians. We present detailed character error rates for the different number of Gaussians.
Different kinds of systems are developed and a two stage decoding framework is applied, which uses cross-adaptation and a subsequent lattice-based system combination. In addition to various acoustic front-ends, these systems use different kinds of neural network toneme posterior features. We present detailed recognition results of the development cycle and the different acoustic front-ends of the systems. Finally, we compare the ultimate evaluation system to our last years system and can report a 10% relative improvement.
Bibliographic reference. Plahl, C. / Hoffmeister, Björn / Heigold, Georg / Lööf, Jonas / Schlüter, Ralf / Ney, Hermann (2009): "Development of the GALE 2008 Mandarin LVCSR system", In INTERSPEECH-2009, 2107-2110.