Posterior features have been shown to yield very good performance in multiple
contexts including speech recognition, spoken term detection, and template matching.
These days, posterior features are usually estimated at the output of a neural
network. More recently, sparse representation has also been shown to potentially
provide additional advantages to improve discrimination and robustness. One
possible instance of this, is referred to as exemplar-based sparse representation.
The present work investigates how to exploit sparse modelling together with posterior space properties to further improve speech recognition features. In that context, we leverage exemplar-based sparse representation, and propose a novel approach to project phone posterior features into a new, high-dimensional, sparse feature space. In fact, exploiting the properties of posterior spaces, we generate, new, high-dimensional, linguistically inspired (sub-phone and words), posterior distributions. Validation experiments are performed on the Phonebook (isolated words) and HIWIRE (continuous speech) databases, which support the effectiveness of the proposed approach for speech recognition tasks.
Bibliographic reference. Bahaadini, Sara / Asaei, Afsaneh / Imseng, David / Bourlard, Hervé (2014): "Posterior-based sparse representation for automatic speech recognition", In INTERSPEECH-2014, 2454-2458.