Novel approaches using high level features have recently shown up in the speaker recognition field. They basically consist in modeling speakers using linguistic features such as words, phonemes, idiolects. The benefit of these features was demonstrated in NIST campaigns. Their main disadvantage is their need of a huge amount of data to be efficient. The purpose of this study is to generalize this approach by using acoustic events, generated by a GMM, as input features. A methodology to build a dictionary and to model speakers using symbol sequences from this dictionary is derived. Different experiments on NIST SRE 2004 database show that the information produced is speaker specific and that a fusion experiment with a GMM verification system improves performance.
Cite as: Scheffer, N., Bonastre, J.-F. (2005) Speaker detection using acoustic event sequences. Proc. Interspeech 2005, 3065-3068, doi: 10.21437/Interspeech.2005-657
@inproceedings{scheffer05_interspeech, author={Nicolas Scheffer and Jean-François Bonastre}, title={{Speaker detection using acoustic event sequences}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3065--3068}, doi={10.21437/Interspeech.2005-657} }