We propose a novel technique to enhance singing voice in monaural music au- dio signals by capturing uctuation of singing voice on spectrogram. Based on multiple spectrogram representation, the method separates an input signal into three components: stationary, uctuated, and transient components, and singing voice is mainly included in the uctuated component. The proposed algorithm consists of two-stage processing of the sinusoidal/non-sinusoidal separation algorithm which we have recently developed. It is called harmonic/percussive sound separation (HPSS). In 12;rst stage, we 12;lter out the stationary component based on HPSS analysis with long frame, and in second stage, we 12;lter out the transient component based on HPSS analysis with short frame. We show that the proposed method effectively enhances the singing voice in music by experiments and show its application to melody extraction, which also supports the effectiveness of the method.
Cite as: Tachibana, H., Ono, N., Sagayama, S. (2010) Singing voice enhancement for monaural music signals based on multiple time-frequency analysis. Proc. First Interdisciplinary Workshop on Singing Voice (InterSinging 2010), 35-38
@inproceedings{tachibana10b_intersinging, author={Hideyuki Tachibana and Nobutaka Ono and Shigeki Sagayama}, title={{Singing voice enhancement for monaural music signals based on multiple time-frequency analysis}}, year=2010, booktitle={Proc. First Interdisciplinary Workshop on Singing Voice (InterSinging 2010)}, pages={35--38} }