We consider the problem of learning transformations of acoustic feature vectors for phonetic frame classification, in a multi-view setting where articulatory measurements are available at training time but not at test time. Canonical correlation analysis (CCA) has previously been used to learn linear transformations of the acoustic features that are maximally correlated with articulatory measurements. Here, we learn nonlinear transformations of the acoustics using kernel canonical correlation analysis (KCCA). We present an incremental SVD approach that makes the KCCA computations feasible for typical speech data set sizes. In phonetic frame classification experiments on data drawn from the University of Wisconsin X-ray Microbeam Database, we find that KCCA provides consistent improvements over linear CCA, as well as over single-view unsupervised dimensionality reduction.
Index Terms: multi-view learning, kernel canonical correlation analysis, XRMB, articulatory measurements
Cite as: Arora, R., Livescu, K. (2012) Kernel CCA for multi-view learning of acoustic features using articulatory measurements. Proc. Machine Learning in Speech and Language Processing (MLSLP 2012), 34-37
@inproceedings{arora12_mlslp, author={Raman Arora and Karen Livescu}, title={{Kernel CCA for multi-view learning of acoustic features using articulatory measurements}}, year=2012, booktitle={Proc. Machine Learning in Speech and Language Processing (MLSLP 2012)}, pages={34--37} }