16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

DNN-Based Speech Bandwidth Expansion and Its Application to Adding High-Frequency Missing Features for Automatic Speech Recognition of Narrowband Speech

Kehuang Li (1), Zhen Huang (1), Yong Xu (2), Chin-Hui Lee (1)

(1) Georgia Institute of Technology, USA
(2) USTC, China

We propose a number of enhancement techniques to improve speech quality in bandwidth expansion (BWE) from narrowband to wideband speech, addressing three issues, which could be critical in real-world applications, namely: (1) discontinuity between narrowband spectrum and the estimated high frequency spectrum, (2) energy mismatch between testing and training utterances, and (3) expanding bandwidth of out-of-domain speech signals. With an inherent prediction of missing high frequency features in bandwidth-expanded speech we also explore the feasibility of adding these estimated features to those extracted from narrowband speech in order to improve the system performance for automatic speech recognition (ASR) of narrowband speech. Leveraging upon a recently-proposed deep neural network based speech BWE system intended for hearing quality enhancement these techniques not only improve over the traditionally-adopted objective and subjective measures but also reduce the word error rate (WER) from 8.67% when recognizing narrowband speech to 8.26% when recognizing bandwidth-expanded speech, and almost approaching the WER of 8.12% when recognizing wideband speech in the 20,000-word open-vocabulary Wall Street Journal ASR task.

Full Paper

Bibliographic reference.  Li, Kehuang / Huang, Zhen / Xu, Yong / Lee, Chin-Hui (2015): "DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech", In INTERSPEECH-2015, 2578-2582.