11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Hierarchical Bottle Neck Features for LVCSR

Christian Plahl, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

This paper investigates the combination of different neural network topologies for probabilistic feature extraction. On one hand, a five-layer neural network used in bottle neck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transform, independently of the training targets. On the other hand, a hierarchical processing technique is effective and robust over several conditions. Even though the hierarchical and bottle neck processing performs equally well, the combination of both topologies improves the system by 5% relative. Furthermore, the MFCC baseline system is improved by up to 20% relative. This behaviour could be confirmed on two different tasks. In addition, we analyse the influence of multi-resolution RASTA filtering and long-term spectral features as input for the neural network feature extraction.

Full Paper

Bibliographic reference.  Plahl, Christian / Schlüter, Ralf / Ney, Hermann (2010): "Hierarchical bottle neck features for LVCSR", In INTERSPEECH-2010, 1197-1200.