Intelligent Voice ASR system for Iberspeech 2018 Speech to Text Transcription Challenge

Nazim Dugan, Cornelius Glackin, Gérard Chollet, Nigel Cannings


Provided ground truth transcriptions for training and development are cleaned up using customized clean-up scripts and realigned using a two-step alignment procedure which uses word lattice results coming from a previous ASR system trained for European Spanish. An utterance level selection mechanism is applied on training and development data by calculating word error rate (WER) using the results of previous ASR system. 261 hours of data is selected from train and dev1 subsections of the provided data by applying a selection criterion on the utterance level scoring results. Selected data is merged by 91 hours of training data of previous ASR system and 3-times data augmentation is applied by reverberation using a noise corpus. 1057 hours of final training data is used in the training of a nnet3 chain acoustic model with MFCC's and iVectors as input features using Kaldi framework where GMM iterative phone alignment is used before starting neural network training. Selected text of train and dev1 subsections are also used for new pronunciation additions and language model (LM) adaptation of the LM of the previous ASR System. Generated model is tested using data from dev2 subsection selected with the same procedure as the training data.


 DOI: 10.21437/IberSPEECH.2018-57

Cite as: Dugan, N., Glackin, C., Chollet, G., Cannings, N. (2018) Intelligent Voice ASR system for Iberspeech 2018 Speech to Text Transcription Challenge. Proc. IberSPEECH 2018, 272-276, DOI: 10.21437/IberSPEECH.2018-57.


@inproceedings{Dugan2018,
  author={Nazim Dugan and Cornelius Glackin and Gérard Chollet and Nigel Cannings},
  title={{Intelligent Voice ASR system for Iberspeech 2018 Speech to Text Transcription Challenge}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={272--276},
  doi={10.21437/IberSPEECH.2018-57},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-57}
}