12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge

P. Mowlaee (1), R. Saeidi (2), Zheng-Hua Tan (3), M. G. Christensen (3), Tomi Kinnunen (2), P. Fränti (2), S. H. Jensen (3)

(1) Ruhr-Universität Bochum, Germany
(2) University of Eastern Finland, Finland
(3) Aalborg University, Denmark

Most of the single-channel speech separation (SCSS) systems use the short-time Fourier transform as their parametric features. Recent studies have shown that employing sinusoidal features for the SCSS application results in a high perceived speech quality. In this paper, we make a systematic study on automatic speech recognition results for a SCSS system that uses sinusoidal features composed of amplitude and frequency. We compare the speech recognition results with those already reported by other participants in the single-channel speech separation and recognition challenge. Our results show that a newly proposed system achieves an overall recognition accuracy of 52.3%, ranges at the median over all other participants in the challenge.

Full Paper

Bibliographic reference.  Mowlaee, P. / Saeidi, R. / Tan, Zheng-Hua / Christensen, M. G. / Kinnunen, Tomi / Fränti, P. / Jensen, S. H. (2011): "Sinusoidal approach for the single-channel speech separation and recognition challenge", In INTERSPEECH-2011, 677-680.