The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0.1s to 1s with a bin width of 0.05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.
Bibliographic reference. Xiao, Xiong / Zhao, Shengkui / Zhong, Xionghu / Jones, Douglas L. / Chng, Eng Siong / Li, Haizhou (2015): "Learning to estimate reverberation time in noisy and reverberant rooms", In INTERSPEECH-2015, 3431-3435.