Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations

Akash Kumar Dhaka, Giampiero Salvi


We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition with limited linguistically annotated material. Our method combines sparse autoencoders with feed-forward networks, thus taking advantage of both unlabelled and labelled data simultaneously through mini-batch stochastic gradient descent. We tested the method with varying proportions of labelled vs unlabelled observations in frame-based phoneme classification on the TIMIT database. Our experiments show that the method outperforms standard supervised models of similar complexity for an equal amount of labelled data and provides competitive error rates compared to state-of-the-art graph-based semi-supervised learning techniques.


 DOI: 10.21437/GLU.2017-5

Cite as: Kumar Dhaka, A., Salvi, G. (2017) Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations. Proc. GLU 2017 International Workshop on Grounding Language Understanding, 22-26, DOI: 10.21437/GLU.2017-5.


@inproceedings{Kumar Dhaka2017,
  author={Akash {Kumar Dhaka} and Giampiero Salvi},
  title={Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations},
  year=2017,
  booktitle={Proc. GLU 2017 International Workshop on Grounding Language Understanding},
  pages={22--26},
  doi={10.21437/GLU.2017-5},
  url={http://dx.doi.org/10.21437/GLU.2017-5}
}