12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System

Young-Ik Kim, Hoon-Young Cho, Sang-Hun Kim

ETRI, Korea

We present a practical and noise-robust speech recognition system which estimates a target-to-interferers power ratio using a zerocrossing- based binaural model and applies the power ratio to a channel attentive missing feature decoder in the cepstral domain. In a natural multisource environment, our binaural model extracts spatial cues at each zero-crossing of a filterbank output signal to localize multiple sound sources and estimates a ratio mask reliably which segregates target speech from interfering noises. Our system uses gammatone filterbank cepstral coefficients (GFCCs) for the recognition and the channel attentive decoder utilizes the ratio mask on weighting the cepstral features when calculating the output probability in the Viterbi decoding. On the experiments of CHiME final testset, our channel attentive GFCC system improves the baseline recognition result 12.2% on average, and with noisy training condition, the average improvement amounts to 18.8%.

Full Paper

Bibliographic reference.  Kim, Young-Ik / Cho, Hoon-Young / Kim, Sang-Hun (2011): "Zero-crossing-based channel attentive weighting of cepstral features for robust speech recognition: the ETRI 2011 CHiME challenge system", In INTERSPEECH-2011, 1649-1652.