It is well known that in reverberant environments, the human auditory system has the ability to pre-process reverberant signals to compensate for reflections and obtain effective cues for improved recognition. In this study, we propose such a preprocessing technique for combined detection and enhancement of speech using a single microphone in reverberant environments for distant speech applications. The proposed system employs a framework where the target speech is synthesized using continuous auditory masks estimated from sub-band signals. Linear gammatone analysis/synthesis filter banks are used as an auditory model for sub-band processing. The performance of the proposed system is evaluated on the UT-DistantReverb corpus which consists of speech recorded in a reverberant racquetball court (T60~9000 msec). The current system shows an average improvement of 15% STNR over an existing single-channel dereverberation algorithm and 17% improvement in detecting speech frames over G729B, SOHN & Combo-SAD unsupervised speech activity detectors on actual reverberant and noisy environments.
Cite as: Kothapally, V., Hansen, J.H.L. (2017) Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments. Proc. Interspeech 2017, 1948-1952, doi: 10.21437/Interspeech.2017-1760
@inproceedings{kothapally17_interspeech, author={Vinay Kothapally and John H.L. Hansen}, title={{Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1948--1952}, doi={10.21437/Interspeech.2017-1760} }