SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Data-driven Speech Representations for NMF-based Word Learning

Joris Driesen, Jort F. Gemmeke, Hugo Van hamme

Dept. ESAT, KU Leuven, Leuven, Belgium

State-of-the-art solutions in ASR often rely on large amounts of expert prior knowledge, which is undesirable in some applications. In this paper, we consider a NMFbased framework that learns a small vocabulary of words directly from input data, without prior knowledge such as phone sets and dictionaries. In the context of this learning scheme, we compare several spectral representations of speech. Where necessary, we propose changes to their derivation to avoid the usage of prior linguistic knowledge. Also, in a comparison of several acoustic modelling techniques, we determine what model properties are beneficial to the framework’s performance.

Index Terms: keyword learning, non-negative matrix factorisation, clustering, acoustic modelling

Full Paper

Bibliographic reference.  Driesen, Joris / Gemmeke, Jort F. / Van hamme, Hugo (2012): "Data-driven speech representations for NMF-based word learning", In SAPA-SCALE-2012, 98-103.