Observation of speech spectrum leads to the fact that speech has a specific spectral fluctuation pattern both along time and frequency. In this paper, we integrate the usage of this nature in a multi-feature approach for voice activity detection. The effect of separating such specific spectral fluctuation using multi-stage HPSS (Harmonic-Percussive Sound Separation) has been analyzed over conventional features in voice activity detection, reducing frame-wise detection error by up to 78%, depending on the SNR conditions and noise type. The multi-feature approach has been tested using Hidden Markov Models to model the features stream as a sequence, which has out-performed standard and similar VAD proposals in utterance-based tests intended for automatic speech recognition.
Bibliographic reference. Espi, Miquel / Miyabe, Shigeki / Nishimoto, Takuya / Ono, Nobutaka / Sagayama, Shigeki (2011): "Using spectral fluctuation of speech in multi-feature HMM-based voice activity detection", In INTERSPEECH-2011, 2613-2616.