This paper presents novel voice activity detection (VAD) approach based on incremental subspace learning using harmonicity-based features. Harmonic structure is well known as noise robust speech feature. We develop novel harmonicity-based feature based on temporal-spectral co-occurrence patterns. At statistical decision stage, many conventional statistical VAD methods rely on Gaussian model; however, owing to the non-Gaussian nature in speech, Gaussian model becomes faulty and produces incorrect VAD results. We reformulate the VAD by incremental subspace learning. The candid covariance-free incremental PCA (CCIPCA) subspace method is employed to adaptively model the input sound by a subspace. Subsequently, a speech activity measure can be established based on the distance from input sound to the adaptive subspace. Notably, the CCIPCA subspace update interval is set to 0.5 second in this work and the deviation distance is computed afterwards. In such short time scale, environmental sound present more Gaussian-like/stationary pattern and therefore can be well accommodated by adaptive subspace, conversely, speech always exhibit non-stationary characteristic which lead to distinct deviation to the adaptive acoustic subspace, and thus, can be effectively distinguished. We experimentally compared our scheme with various VAD methods over real-world data. The results validate the effectiveness of the proposed approach.
Bibliographic reference. Ye, Jiaxing / Kobayashi, Takumi / Murakawa, Masahiro / Higuchi, Tetsuya (2013): "Incremental acoustic subspace learning for voice activity detection using harmonicity-based features", In INTERSPEECH-2013, 695-699.