7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper a robust mel frequency cepstral coefficient feature extraction procedure using noise reduction, frame attenuation and RASTA processing is presented. In the preprocessing stage a hybrid Hamming-Cosine window is applied. To minimize the effect of additive environmental noise on speech signal a spectral subtraction based on spectral smoothing is used. A general mel filtering approach is performed on noise reduced signal. To detect speech frames, a voice activity detection based on log filter-bank energies is performed. The log filter-bank magnitudes of noise-only frames are attenuated. To reduce the level of convolutional distortion, a RASTA filtering of log filter-bank energy trajectories is applied. At final stage, a noise robust feature vector, which consists of 12 mel cepstrum coefficients and the log energy is created. For evaluation of improvement of speech recognition with the proposed front-end, the Aurora (2), 3 databases together with the HTK speech recognition toolkit have been chosen. The total improvement of 41.14% (Aurora 2) and 45.06% (Aurora 3) relative to the baseline MFCC front-end is achieved.
Bibliographic reference. Kotnik, Bojan / Vlaj, Damjan / Kacic, Zdravko / Horvat, Bogomir (2002): "Efficient additive and convolutional noise reduction procedures", In ICSLP-2002, 445-448.