Recognition and classification of speech content in everyday environments is challenging due to the large diversity of real-world noise sources, which may also include competing speech. At signal-to-noise ratios below 0 dB, a majority of features may become corrupted, severely degrading the performance of classifiers built upon clean observations of a target class. As the energy and complexity of competing sources increase, their explicit modelling becomes integral for successful detection and classification of target speech. We have previously demonstrated how non-negative compositional modelling in a spectrogram space is suitable for robust recognition of speech and speakers even at low SNRs. In this work, the sparse coding approach is extended to cover the whole separation and classification chain to recognise the speaker of short utterances in difficult noise environments. A convolutive matrix factorisation and coding system is evaluated on 2nd CHiME Track 1 data. Over 98% average speaker recognition accuracy is achieved for shorter than three second utterances at +9 … -6 dB SNR, illustrating the system's performance in challenging conditions.
Bibliographic reference. Hurmalainen, Antti / Saeidi, Rahim / Virtanen, Tuomas (2015): "Noise robust speaker recognition with convolutive sparse coding", In INTERSPEECH-2015, 244-248.