In this paper we propose a Shared Hidden Layer Multi-softmax Deep Neural Network (SHL-MDNN) approach for semi-supervised training (SST). This approach aims to boost low-resource speech recognition where limited training data is available. Supervised data and unsupervised data share the same hidden layers but are fed into different softmax layers so that erroneous automatic speech recognition (ASR) transcriptions of the unsupervised data have less effect on shared hidden layers. Experimental results on Babel data indicate that this approach always outperform naive SST on DNN, and it can yield 1.3% word error rate (WER) reduction compared with supervised DNN hybrid system. In addition, if softmax layer is retrained with supervised data, it can lead up to another 0.8% WER reduction. Confidence based data selection is also studied in this setup. Experiments show that this method is not sensitive to ASR transcription errors.
Bibliographic reference. Su, Hang / Xu, Haihua (2015): "Multi-softmax deep neural network for semi-supervised training", In INTERSPEECH-2015, 3239-3243.