Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification

Abinay Reddy Naini, Achuth Rao M.V., Prasanta Kumar Ghosh


In this work, we propose a novel feature mapping (FM) from whispered to neutral speech features using a cosine similarity based objective function for speaker verification (SV) using whispered speech. Typically the performance of an SV system enrolled with neutral speech degrades significantly when tested using whispered speech, due to the differences between spectral characteristics of neutral and whispered speech. We hypothesize that FM from whispered Mel frequency cepstral coefficients (MFCC) to neutral MFCC by maximizing cosine similarity between neutral and whisper i-vectors yields better performance than the baseline method, which typically performs a direct FM between MFCC features by minimizing mean squared error (MSE). We also explored an affine transform between MFCC features using the proposed objective function. Whisper SV experiments with 1882 speakers reveal that the equal error rate (EER) using the proposed method is lower than that using the best baseline by ~24% (relative). We show that the proposed FM system maintains the neutral SV performance, while improving the EER of whispered SV unlike baseline methods. We also show that the bias in the learned affine transform is corresponds to the glottal flow information, which is absent in the whispered speech.


 DOI: 10.21437/Interspeech.2019-2280

Cite as: Naini, A.R., M.V., A.R., Ghosh, P.K. (2019) Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification. Proc. Interspeech 2019, 4340-4344, DOI: 10.21437/Interspeech.2019-2280.


@inproceedings{Naini2019,
  author={Abinay Reddy Naini and Achuth Rao M.V. and Prasanta Kumar Ghosh},
  title={{Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4340--4344},
  doi={10.21437/Interspeech.2019-2280},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2280}
}