Spectrogram factorisation using a dictionary of spectro-temporal atoms has been successfully employed to separate a mixed audio signal into its source components. When atoms from multiple sources are included in a combined dictionary, the relative weights of activated atoms reveal likely sources as well as the content of each source. Enforcing sparsity on the activation weights produces solutions, where only a small number of atoms are active at a time. In this paper we propose using group sparsity to restrict simultaneous activation of sources, allowing us to discover the identity of an unknown speaker from multiple candidates, and further to recognise the phonetic content more reliably with a narrowed down subset of atoms belonging to the most likely speakers. An evaluation on the CHiME corpus shows that the use of group sparsity improves the results of noise robust speaker identification and speech recognition using speaker-dependent models.
Index Terms: group sparsity, speech recognition, speaker identification, spectrogram factorization
Bibliographic reference. Hurmalainen, Antti / Saeidi, Rahim / Virtanen, Tuomas (2012): "Group sparsity for speaker identity discrimination in factorisation-based speech recognition", In INTERSPEECH-2012, 2138-2141.