EUROSPEECH 2003 - INTERSPEECH 2003
Sub-band speech recognition approaches have been proposed for robust speech recognition, where full-band power spectra are divided into several sub-bands and then likelihoods or cepstral vectors of the sub-bands are merged depending on their reliability. In conventional sub-band approaches, correlations across the sub-bands are not modeled and the merging weights can only be set experientially or estimated during training procedures, which may not match observed data. The methods further degrade performance for clean speech. We proposed a novel sub-band approach, where frequency sub-bands are multiplied with weighting factors and merged, which considers sub-band dependence and proves to be more robust than both full-band and conventional sub-band approaches. And further the weighting factors can be obtained by using the maximum-likelihood estimation approaches in order to minimize the mismatch between the trained models and the observed features. Finally we evaluated our methods on both the Aurora2 task and the Resource Management task and showed improvement of performance on the two tasks consistently.
Bibliographic reference. Zhu, Donglai / Nakamura, Satoshi / Paliwal, Kuldip K. / Wang, Renhua (2003): "Maximum likelihood sub-band weighting for robust speech recognition", In EUROSPEECH-2003, 673-676.