EUROSPEECH 2003 - INTERSPEECH 2003
This paper investigates the problem of speaker identification in noisy conditions, assuming that there is no prior knowledge about the noise. To confine the effect of the noise on recognition, we use a multi-stream approach to characterize the speech signal, assuming that while all of the feature streams may be affected by the noise, there may be some streams that are less severely affected and thus still provide useful information about the speaker. Recognition decisions are based on the feature streams that are uncontaminated or least contaminated, thereby reducing the effect of the noise on recognition. We introduce a novel statistical method, the posterior union model, for selecting reliable feature streams. An advantage of the union model is that knowledge of the structure of the noise is not needed, thereby providing robustness to time-varying unpredictable noise corruption. We have tested the new method on the TIMIT database with additive corruption from real-world nonstationary noise; the results obtained are encouraging.
Bibliographic reference. Ming, Ji / Stewart, Darryl / Hanna, Philip / Corr, Pat / Smith, Jack / Vaseghi, Saeed (2003): "Robust speaker identification using posterior union models", In EUROSPEECH-2003, 2645-2648.