12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Speech Recognition in Mixed Sound of Speech and Music Based on Vector Quantization and Non-Negative Matrix Factorization

Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Toyohashi University of Technology, Japan

This paper describes a speech recognition method for mixed sound, consisting of speech and music, that removes the music only based on vector quantization (VQ) and non-negative matrix factorization (NMF). For isolated word recognition using the clean speech model, an improvement of about 15% was obtained compared with the case of not removing music. Furthermore, a high recognition rate of about 90% was achieved, even under the 0 dB condition using a model trained from the mixed sound after removing the music according to the VQ method.

Full Paper

Bibliographic reference.  Nakano, Shoichi / Yamamoto, Kazumasa / Nakagawa, Seiichi (2011): "Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization", In INTERSPEECH-2011, 1781-1784.