Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

Qingming Tang, Weiran Wang, Karen Livescu


We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA’s advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeam Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition.


 DOI: 10.21437/Interspeech.2017-1581

Cite as: Tang, Q., Wang, W., Livescu, K. (2017) Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis. Proc. Interspeech 2017, 1656-1660, DOI: 10.21437/Interspeech.2017-1581.


@inproceedings{Tang2017,
  author={Qingming Tang and Weiran Wang and Karen Livescu},
  title={Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1656--1660},
  doi={10.21437/Interspeech.2017-1581},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1581}
}