This paper describes a speech recognition method for mixed sound, consisting of speech and music, that removes the music only based on vector quantization (VQ) and non-negative matrix factorization (NMF). For isolated word recognition using the clean speech model, an improvement of about 15% was obtained compared with the case of not removing music. Furthermore, a high recognition rate of about 90% was achieved, even under the 0 dB condition using a model trained from the mixed sound after removing the music according to the VQ method.
Bibliographic reference. Nakano, Shoichi / Yamamoto, Kazumasa / Nakagawa, Seiichi (2011): "Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization", In INTERSPEECH-2011, 1781-1784.