September 22-25, 1997
The determination of the precise moment in which speech begins or ends is an important problem in ASR. As showed in , small separations from the optimum beginning and ending point, imply a great decrease in the recognition accuracy. The presence of noise  , specially when its level is high (around 95 dB as in the case of this work), and its characteristics are highly non-stationary, is an added problem, since it can produce false shots (more probable when the noise includes speech sounds). That is the reason why in such conditions, it is important to have a pre-processing stage that removes as much noise as is possible, and that gives some clues that help to build an end-point detector for those environments. The method here presented offers a pre-processing technique for highly noisy and non stationary environments, which at the same time that enhances the speech, gives an equalised version of the SNR improvement (Mean Spectral Energy Difference), whose main characteristic is that large differences in the level of noise are changed to a little ripple, while the presence of speech is distinguished by a large decrease in this Mean Spectral Energy Difference. Following this technique, any End-point Detection approach (explicit, implicit or hybrid ) may render acceptable results.
Bibliographic reference. Martinez, Rafael / Alvarez, Agustin / Gomez, Vilda Pedro / Perez, Mercedes / Nieto, Victor / Rodellar, Victoria (1997): "A speech pre-processing technique for end-point detection in highly non-stationary environments", In EUROSPEECH-1997, 1111-1114.