2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)
This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, another approach to minimizing the mismatches between clean and noisy speech features is needed. In this paper, we propose a feature extraction technique that is robust to noisy environments. The proposed scheme is based on the weighted histogram of the time-frequency gradient in a Melspectrogram image. Unlike previous approaches that use the magnitude of a Mel-spectrogram, we use the angle and magnitude information of a local gradient by employing a weighted histogram. Thus, our proposed speech feature shows a lower mean square error (MSE) between clean and noisy condition features as compared to other well-known speech features. In addition, the proposed scheme improves the word recognition test in a noisy environment with a relatively smaller number of coefficients as compared to similar studies.
Index Terms: automatic speech recognition (ASR), noise robust speech feature, Mel-spectrogram gradient histogram
Bibliographic reference. Park, Taejin / Beack, Seungkwon / Lee, Taejin (2014): "Noise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram", In SLAM-2014, 67-71.