Interspeech'2005 - Eurospeech
This paper presents a content-based approach of audio source indexing without any prior knowledge about the sources. The empirical mode decomposition (EMD) scheme, capable of decomposing nonlinear and non-stationary signals into some bases, is employed to implement the sub-band approach of the audio discrimination technique. The feature vectors are derived from each of selected sub-bands of the target frame. Linear predictive cepstrum coefficient (LPCC) is used as the main feature vector and Kullback-Leibler divergence (KLd) is performed as the scoring function to measure the similarity of the feature vectors. The higher order statistics (HOS) is employed to compute the LPCC. The use of HOS makes LPCC less affected by Gaussian noise. The experimental results show that the sub-band approach produces better discrimination efficiency than that of the full-band technique. This discrimination method is also suitable to solve the source permutation ambiguity in separation of multiple and concurrent moving sources from the mixture(s).
Bibliographic reference. Molla, Md. Khademul Islam / Hirose, Keikichi / Minematsu, Nobuaki (2005): "Multi-band approach of audio source discrimination with empirical mode decomposition", In INTERSPEECH-2005, 673-676.