12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Using Spectral Fluctuation of Speech in Multi-Feature HMM-Based Voice Activity Detection

Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama

University of Tokyo, Japan

Observation of speech spectrum leads to the fact that speech has a specific spectral fluctuation pattern both along time and frequency. In this paper, we integrate the usage of this nature in a multi-feature approach for voice activity detection. The effect of separating such specific spectral fluctuation using multi-stage HPSS (Harmonic-Percussive Sound Separation) has been analyzed over conventional features in voice activity detection, reducing frame-wise detection error by up to 78%, depending on the SNR conditions and noise type. The multi-feature approach has been tested using Hidden Markov Models to model the features stream as a sequence, which has out-performed standard and similar VAD proposals in utterance-based tests intended for automatic speech recognition.

Full Paper

Bibliographic reference.  Espi, Miquel / Miyabe, Shigeki / Nishimoto, Takuya / Ono, Nobutaka / Sagayama, Shigeki (2011): "Using spectral fluctuation of speech in multi-feature HMM-based voice activity detection", In INTERSPEECH-2011, 2613-2616.