Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks

Yu Gu, Zhen-Hua Ling, Li-Rong Dai


This paper presents a novel method for speech bandwidth extension (BWE) using deep structured neural networks. In order to utilize linguistic information during the prediction of high-frequency spectral components, the bottleneck (BN) features derived from a deep neural network (DNN)-based state classifier for narrowband speech are employed as auxiliary input. Furthermore, recurrent neural networks (RNNs) incorporating long short-term memory (LSTM) cells are adopted to model the complex mapping relationship between the feature sequences describing low-frequency and high-frequency spectra. Experimental results show that the BWE method proposed in this paper can achieve better performance than the conventional method based on Gaussian mixture models (GMMs) and the state-of-the-art approach based on DNNs in both objective and subjective tests.


DOI: 10.21437/Interspeech.2016-678

Cite as

Gu, Y., Ling, Z., Dai, L. (2016) Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. Proc. Interspeech 2016, 297-301.

Bibtex
@inproceedings{Gu+2016,
author={Yu Gu and Zhen-Hua Ling and Li-Rong Dai},
title={Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-678},
url={http://dx.doi.org/10.21437/Interspeech.2016-678},
pages={297--301}
}