11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

State-Based Labelling for a Sparse Representation of Speech and Its Application to Robust Speech Recognition

Tuomas Virtanen (1), Jort F. Gemmeke (2), Antti Hurmalainen (1)

(1) Tampere University of Technology, Finland
(2) Radboud Universiteit Nijmegen, The Netherlands

This paper proposes a state-based labeling for acoustic patterns of speech and a method for using this labelling in noise-robust automatic speech recognition. Acoustic time-frequency segments of speech, exemplars, are obtained from a training database and associated with time-varying state labels using the transcriptions. In the recognition phase, noisy speech is modeled by a sparse linear combination of noise and speech exemplars. The likelihoods of states are obtained by linear combination of the exemplar weights, which can then be used to estimate the most likely state transition path. The proposed method was tested in the connected digit recognition task with noisy speech material from the Aurora-2 database where it is shown to produce better results than the existing histogram-based labeling method.

Full Paper

Bibliographic reference.  Virtanen, Tuomas / Gemmeke, Jort F. / Hurmalainen, Antti (2010): "State-based labelling for a sparse representation of speech and its application to robust speech recognition", In INTERSPEECH-2010, 893-896.