Most of the single-channel speech separation (SCSS) systems use the short-time Fourier transform as their parametric features. Recent studies have shown that employing sinusoidal features for the SCSS application results in a high perceived speech quality. In this paper, we make a systematic study on automatic speech recognition results for a SCSS system that uses sinusoidal features composed of amplitude and frequency. We compare the speech recognition results with those already reported by other participants in the single-channel speech separation and recognition challenge. Our results show that a newly proposed system achieves an overall recognition accuracy of 52.3%, ranges at the median over all other participants in the challenge.
Bibliographic reference. Mowlaee, P. / Saeidi, R. / Tan, Zheng-Hua / Christensen, M. G. / Kinnunen, Tomi / Fränti, P. / Jensen, S. H. (2011): "Sinusoidal approach for the single-channel speech separation and recognition challenge", In INTERSPEECH-2011, 677-680.