This paper reports our recent progress on using multilingual data for
improving speech-to-text (STT) systems that can be easily delivered.
We continued the work BBN conducted on the use of multilingual data
for improving Babel evaluation systems, but focused on training time-delay
neural network (TDNN) based chain models. As done for the Babel evaluations,
we used multilingual data in two ways: first, to train multilingual
deep neural networks (DNN) for extracting bottle-neck (BN) features,
and second, for initializing training on target languages.
Our results show that
TDNN chain models trained on multilingual DNN bottleneck features yield
significant gains over their counterparts trained on MFCC plus i-vector
features. By initializing from models trained on multilingual data,
TDNN chain models can achieve great improvements over random initializations
of the network weights on target languages. Two other important findings
are: 1) initialization with multilingual TDNN chain models produces
larger gains on target languages that have less training data; 2) inclusion
of target languages in multilingual training for either BN feature
extraction or initialization have limited impact on performance measured
on the target languages. Our results also reveal that for TDNN chain
models, the combination of multilingual BN features and multilingual
initialization achieves the best performance on all target languages.
Cite as: Ma, J., Keith, F., Ng, T., Siu, M.-H., Kimball, O. (2017) Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer. Proc. Interspeech 2017, 127-131, doi: 10.21437/Interspeech.2017-1058
@inproceedings{ma17_interspeech, author={Jeff Ma and Francis Keith and Tim Ng and Man-Hung Siu and Owen Kimball}, title={{Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={127--131}, doi={10.21437/Interspeech.2017-1058} }