INTERSPEECH 2006 - ICSLP
In spoken document retrieval, it is necessary to support a variety of audio corpora from sources that have a range of conditions (e.g., channels, microphones, noise conditions, recording media, etc.). Varying band-limited speech represents one of the most challenging factors for robust speech recognition. The missing-feature reconstruction method shows the effectiveness in recognition of the speech corrupted by additive noise. However, it has a problem when applied to the band-limited speech reconstruction, since it assumes that the observations in the unreliable regions are always greater than the latent original clean speech. In this study, we propose to modify the current way to calculate the marginal probability for reconstruction into the computation depending only on the reliable components. To detect the cut-off regions from incoming speech, the blind mask estimation scheme is proposed, which employs the synthesized band-limited speech model without training database. Experimental results on Aurora 2.0 and actual band-limited speech (NGSW corpus) indicate that the proposed method is effective in improving recognition accuracy of the band-limited speech. Through combining with an adaptation method, 22.17% of relative improvement is obtained on NGSW.
Bibliographic reference. Kim, Wooil / Hansen, John H. L. (2006): "Missing-feature reconstruction for band-limited speech recognition in spoken document retrieval", In INTERSPEECH-2006, paper 1826-Thu1CaP.2.