We propose a speaker verification method using non-audible murmur (NAM) segments, which are different from normal speech and hard for other people to catch them. To use NAM, we therefore take a text-dependent verification strategy in which each user utters her/his own keyword phrase and utilize not only speaker-specific but also keyword-specific acoustic information. We expect this strategy to yield a relatively high performance. NAM segments, which consist of multiple short-term feature vectors, are used as input vectors to capture keyword-specific acoustic information well. To handle segments with a large number of dimensions, we use the support vector machine (SVM). In experiments using NAM data of 19 male and 10 female speakers recorded in three different sessions, we achieved equal error rates of 0.04% (male) and 1.1% (female) when using 145-ms-long NAM segments. These rates are half or less those obtained with 25-ms-long input vectors.
Cite as: Kojima, M., Matsui, T., Kawanami, H., Saruwatari, H., Shikano, K. (2006) Speaker verification with non-audible murmur segments. Proc. Interspeech 2006, paper 1773-Wed3CaP.10, doi: 10.21437/Interspeech.2006-194
@inproceedings{kojima06_interspeech, author={Mariko Kojima and Tomoko Matsui and Hiromichi Kawanami and Hiroshi Saruwatari and Kiyohiro Shikano}, title={{Speaker verification with non-audible murmur segments}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1773-Wed3CaP.10}, doi={10.21437/Interspeech.2006-194} }