ISCA Archive SAPA 2012
ISCA Archive SAPA 2012

Data-driven speech representations for NMF-based word learning

Joris Driesen, Jort F. Gemmeke, Hugo Van hamme

State-of-the-art solutions in ASR often rely on large amounts of expert prior knowledge, which is undesirable in some applications. In this paper, we consider a NMFbased framework that learns a small vocabulary of words directly from input data, without prior knowledge such as phone sets and dictionaries. In the context of this learning scheme, we compare several spectral representations of speech. Where necessary, we propose changes to their derivation to avoid the usage of prior linguistic knowledge. Also, in a comparison of several acoustic modelling techniques, we determine what model properties are beneficial to the framework’s performance.

Index Terms: keyword learning, non-negative matrix factorisation, clustering, acoustic modelling

Cite as: Driesen, J., Gemmeke, J.F., Van hamme, H. (2012) Data-driven speech representations for NMF-based word learning. Proc. SAPA-SCALE conference (SAPA 2012), 98-103

  author={Joris Driesen and Jort F. Gemmeke and Hugo {Van hamme}},
  title={{Data-driven speech representations for NMF-based word learning}},
  booktitle={Proc. SAPA-SCALE conference (SAPA 2012)},