ISCA Workshop on
Statistical And Perceptual Audition

Makuhari, Japan
September 25, 2010

Supervised vs. Unsupervised Learning of Spectro Temporal Speech Features

Martin Heckmann

Honda Research Institute Europe GmbH, Offenbach/Main, Germany

To overcome limitations of purely spectral speech features we previously introduced Hierarchical Spectro-Temporal (HIST) features. We could show that a combination of HIST and standard features does reduce recognition errors in clean and in noise. The HIST features consist of two hierarchical layers where the corresponding filter functions are learned in a data driven way. In this paper we investigate how different learning methodologies applied to the learning of the filters on the second layer influence the performance. We compare Non-negative Matrix Factorization (NMF), Non-negative Sparse Coding (NNSC), and Weight Coding (WC) on a noisy digit recognition task. NMF and NNSC are unsupervised learning algorithms whereas WC also includes class specific information in the learning process. Additionally we investigate how a mismatch between the database used for learning the features and the one employed for training and testing the recognition system influences the performance.

Index Terms: Spectro-temporal, NMF, NNSC, WC, robust speech recognition, auditory

Full Paper

Bibliographic reference.  Heckmann, Martin (2010): "Supervised vs. unsupervised learning of spectro temporal speech features", In SAPA-2010, 1-6.