We propose a novel approach to acoustic modeling based on recent advances in sparse representations. The key idea in sparse coding is to compute a compressed local representation of a signal via an over-complete basis or dictionary that is learned in an unsupervised way. In this study, we compute the local representation on speech spectrogram as the raw signal and use it as the local sparse code to perform a standard phone classification task. A linear classifier is used that directly receives the coding space for making the classification decision. The simplicity of the linear classifier allows us to assess whether the sparse representations are sufficiently rich to serve as effective acoustic features for discriminating speech classes. Our experiments demonstrate competitive error rates when compared to other shallow approaches. An examination of the dictionary learned in sparse feature extraction demonstrates meaningful acoustic-phonetic properties that are captured by a collection of the dictionary entries.
Index Terms: sparse coding, acoustic modeling, phone recognition, acoustic-phonetic properties
Bibliographic reference. Vinyals, Oriol / Deng, Li (2012): "Are sparse representations rich enough for acoustic modeling?", In INTERSPEECH-2012, 2570-2573.