13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Are Sparse Representations Rich Enough for Acoustic Modeling?

Oriol Vinyals (1), Li Deng (2)

(1) University of California at Berkeley, Berkeley, CA, USA
(2) Microsoft Research, Redmond, WA, USA

We propose a novel approach to acoustic modeling based on recent advances in sparse representations. The key idea in sparse coding is to compute a compressed local representation of a signal via an over-complete basis or dictionary that is learned in an unsupervised way. In this study, we compute the local representation on speech spectrogram as the raw “signal” and use it as the local sparse code to perform a standard phone classification task. A linear classifier is used that directly receives the coding space for making the classification decision. The simplicity of the linear classifier allows us to assess whether the sparse representations are sufficiently rich to serve as effective acoustic features for discriminating speech classes. Our experiments demonstrate competitive error rates when compared to other shallow approaches. An examination of the dictionary learned in sparse feature extraction demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.

Index Terms: sparse coding, acoustic modeling, phone recognition, acoustic-phonetic properties

Full Paper

Bibliographic reference.  Vinyals, Oriol / Deng, Li (2012): "Are sparse representations rich enough for acoustic modeling?", In INTERSPEECH-2012, 2570-2573.