8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Speech Segregation Based on Fundamental Event Information Using an Auditory Vocoder

Toshio Irino (1), Roy D. Patterson (2), Hideki Kawahara (1)

(1) Wakayama University, Japan
(2) Cambridge University, U.K.

We present a new auditory method to segregate concurrent speech sounds. The system is based on an auditory vocoder developed to resynthesize speech from an auditory Mellin representation using the vocoder STRAIGHT. The auditory representation preserves fine temporal information, unlike conventional window-based processing, and this makes it possible to segregate speech sources with an event synchronous procedure. We developed a method to convert fundamental frequency information to estimate glottal pulse times so as to facilitate robust extraction of the target speech. The results show that the segregation is good even when the SNR is 0 dB; the extracted target speech was a little distorted but entirely intelligible, whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. So, this auditory vocoder has potential for speech enhancement in applications such as hearing aids.

Full Paper

Bibliographic reference.  Irino, Toshio / Patterson, Roy D. / Kawahara, Hideki (2003): "Speech segregation based on fundamental event information using an auditory vocoder", In EUROSPEECH-2003, 553-556.