EUROSPEECH '97
5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997


A Speech Pre-Processing Technique for End-Point Detection in Highly Non-Stationary Environments

Rafael Martinez, Agustin Alvarez, Vilda Pedro Gomez, Mercedes Perez, Victor Nieto, Victoria Rodellar

Departamento de Arquitectura y Tecnologia de Sistemas Informaticos, Universidad Politecnica de Madrid, Campus de Montegancedo, s/n, Bocadilla del Monte, Madrid, Spain

The determination of the precise moment in which speech begins or ends is an important problem in ASR. As showed in [1], small separations from the optimum beginning and ending point, imply a great decrease in the recognition accuracy. The presence of noise [2] [3], specially when its level is high (around 95 dB as in the case of this work), and its characteristics are highly non-stationary, is an added problem, since it can produce false shots (more probable when the noise includes speech sounds). That is the reason why in such conditions, it is important to have a pre-processing stage that removes as much noise as is possible, and that gives some clues that help to build an end-point detector for those environments. The method here presented offers a pre-processing technique for highly noisy and non stationary environments, which at the same time that enhances the speech, gives an equalised version of the SNR improvement (Mean Spectral Energy Difference), whose main characteristic is that large differences in the level of noise are changed to a little ripple, while the presence of speech is distinguished by a large decrease in this Mean Spectral Energy Difference. Following this technique, any End-point Detection approach (explicit, implicit or hybrid [3]) may render acceptable results.

Full Paper

Bibliographic reference.  Martinez, Rafael / Alvarez, Agustin / Gomez, Vilda Pedro / Perez, Mercedes / Nieto, Victor / Rodellar, Victoria (1997): "A speech pre-processing technique for end-point detection in highly non-stationary environments", In EUROSPEECH-1997, 1111-1114.