Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation

K.V. Vijay Girish, A.G. Ramakrishnan, T.V. Ananthapadmanabha


In the analysis of recordings of conversations, one of the motivations is to be able to identify the nature of background noise as a means of identifying the possible geographical location of a speaker. In a high noise environment, to minimize manual analysis of the recording, it is also desirable to automatically locate only the segments of the recording, which contain speech. The next task is to identify if the speech is from one of the known people. A dictionary learning and block sparsity based source recovery approach has been used to estimate the SNR of a noisy speech recording, simulated at different SNRs using ten different noise sources. Given a test utterance, a noise label is assigned using block sparsity approach, and subsequently, the speaker is classified using sum of weights recovered from the concatenation of speaker dictionaries and the identified noise source dictionary. Using the dictionaries of the identified speaker and noise sources, framewise speech and noise energy are estimated using a source recovery method. The energy estimates are then used to identify the segments, where speech is present. We obtain 100% accuracy for background classification and around 90% for speaker classification at a SNR of 10 dB.


DOI: 10.21437/Interspeech.2016-175

Cite as

Girish, K.V., Ramakrishnan, A., Ananthapadmanabha, T. (2016) Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation. Proc. Interspeech 2016, 2972-2976.

Bibtex
@inproceedings{Girish+2016,
author={K.V. Vijay Girish and A.G. Ramakrishnan and T.V. Ananthapadmanabha},
title={Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-175},
url={http://dx.doi.org/10.21437/Interspeech.2016-175},
pages={2972--2976}
}