Symposium on Machine Learning in Speech and Language Processing (MLSLP)

Bellevue, WA, USA
June 27, 2011

A Two-Layer Non-negative Matrix Factorization Model for Vocabulary Discovery

Meng Sun, Hugo Van hamme

Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Belgium

A two-layer NMF model is proposed for vocabulary discovery. The model first extracts low-level vocabulary patterns based on a histogram of co-occurrences of Gaussians. Then latent units are discovered by spectral embedding of Gaussians at layer-1. Layer-2 discovers vocabulary patterns based on the histogram of co-occurrences of the latent units. Improvements in unordered word error rates are observed from the low-level representation to the two-layermodel on the Aurora2/ Clean database. The relation between the latent units and the states of an HMM is discussed.

Index Terms: non-negativematrix factorization, hidden Markov models, speech recognition

Full Paper    

Bibliographic reference.  Sun, Meng / Van hamme, Hugo (2011): "A two-layer non-negative matrix factorization model for vocabulary discovery", In MLSLP-2011, 11-15.