ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speaker verification based on fusion of acoustic and articulatory information

Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan, Shrikanth Narayanan

We propose a practical, feature-level fusion approach for combining acoustic and articulatory information in speaker verification task. We find that concatenating articulation features obtained from the measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance. However, since access to the measured articulatory data is impractical for real world speaker verification applications, we also experiment with estimated articulatory features obtained using acoustic-to-articulatory inversion technique. Specifically, we show that augmenting MFCCs with articulatory features obtained from subject-independent acousticto- articulatory inversion technique also significantly enhances the speaker verification performance. This performance boost could be due to the information about inter-speaker variation present in the estimated articulatory features, especially at the mean and variance level. Experimental results on the Wisconsin X-Ray Microbeam database show that the proposed acoustic-estimated-articulatory fusion approach significantly outperforms the traditional acousticonly baseline, providing up to 10% relative reduction in Equal Error Rate (EER). We further show that we can achieve an additional 5% relative reduction in EER after score-level fusion.


doi: 10.21437/Interspeech.2013-405

Cite as: Li, M., Kim, J., Ghosh, P.K., Ramanarayanan, V., Narayanan, S. (2013) Speaker verification based on fusion of acoustic and articulatory information. Proc. Interspeech 2013, 1614-1618, doi: 10.21437/Interspeech.2013-405

@inproceedings{li13d_interspeech,
  author={Ming Li and Jangwon Kim and Prasanta Kumar Ghosh and Vikram Ramanarayanan and Shrikanth Narayanan},
  title={{Speaker verification based on fusion of acoustic and articulatory information}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1614--1618},
  doi={10.21437/Interspeech.2013-405}
}