7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper develops a robust speech recognition algorithm against short-time noise, of which no prior knowledge of their spectral characteristics is known. However, we assume that these noises only affects certain part of the speech and are also known as partially temporal corruption in . Examples of these short-time noises include door slam, click sound of keyboard or packet loss in network transmission of voice. These noises are found to degrade the performance of an automatic speech recognizer (ASR) significantly. In our previous work , we proposed a robust algorithm, called frameskipping Viterbi algorithm (FSVA), which ignores the likelihood contributions of the K worst performing frames during Viterbi decoding. We showed that FSVA algorithm is robust against random replacement of speech frames by Gaussian noise. One important issue that we have not addressed is the determination of the number of skips. This paper extends our work by applying the FSVA to additive shorttime noise with unknown spectral characteristic and unexpected occurrence under different SNR, rate of corruption and the length of corruption. We also propose a solution for determining the appropriate number of skips on log likelihood ratio.
Bibliographic reference. Siu, Manhung / Chan, Yu-Chung (2002): "Robust speech recognition against short-time noise", In ICSLP-2002, 1049-1052.