Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production

Philip Weber, Linxue Bai, Martin Russell, Peter Jančovič, Stephen Houghton


Low-dimensional ‘bottleneck’ features extracted from neural networks have been shown to give phoneme recognition accuracy similar to that obtained with higher-dimensional MFCCs, using GMM-HMM models. Such features have also been shown to preserve well the assumptions of speech trajectory dynamics made by dynamic models of speech such as Continuous-State HMMs. However, little is understood about how networks derive these features and how and whether they can be interpreted in terms of human speech perception and production.

We analyse three-dimensional bottleneck features. We show that for vowels, their spatial representation is very close to the familiar F1:F2 vowel quadrilateral. For other classes of phonemes the features can similarly be related to phonetic and acoustic spatial representations presented in the literature. This suggests that these networks derive representations specific to particular phonetic categories, with properties similar to those used by human perception. The representation of the full set of phonemes in the bottleneck space is consistent with a hypothesized comprehensive model of speech perception and also with models of speech perception such as prototype theory.


DOI: 10.21437/Interspeech.2016-124

Cite as

Weber, P., Bai, L., Russell, M., Jančovič, P., Houghton, S. (2016) Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production. Proc. Interspeech 2016, 3384-3388.

Bibtex
@inproceedings{Weber+2016,
author={Philip Weber and Linxue Bai and Martin Russell and Peter Jančovič and Stephen Houghton},
title={Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-124},
url={http://dx.doi.org/10.21437/Interspeech.2016-124},
pages={3384--3388}
}