The Vicomtech-PRHLT Speech Transcription Systems for the IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge

Haritz Arzelus, Aitor Alvarez, Conrad Bernath, Eneritz García, Emilio Granell, Carlos David Martinez Hinarejos


This paper describes our joint submission to the IberSPEECH-RTVE Speech to Text Transcription Challenge 2018, which calls automatic speech transcription systems to be evaluated in realistic TV shows. With the aim of building and evaluating systems, RTVE licensed around 569 hours of different TV programs, which were processed, re-aligned and revised in order to discard segments with imperfect transcriptions. This task reduced the corpus to 136 hours that we considered as nearly perfectly aligned audios and that we employed as in domain data to train acoustic models. A total of 6 systems were built and presented to the evaluation challenge, three systems per condition. These recognition engines are different versions, evolution and configurations of two main architectures. The first architecture includes an hybrid LSTM-HMM acoustic model, where bidirectional LSTMs were trained to provide posterior probabilities for the HMM states. The language model corresponds to modified Kneser-Ney smoothed 3-gram and 9-gram models used for decoding and re-scoring of the lattices respectively. The second architecture includes an End-To-End based recognition system, which combines 2D convolutional neural networks as spectral feature extractor from spectrograms with bidirectional Gated Recur- rent Units as RNN acoustic models. A modified Kneser-Ney smoothed 5-gram model was also integrated to re-score the E2E hypothesis. All the systems' outputs were then punctuated using bidirectional RNN models with attention mechanism and capitalized through recasing techniques.


 DOI: 10.21437/IberSPEECH.2018-56

Cite as: Arzelus, H., Alvarez, A., Bernath, C., García, E., Granell, E., Martinez Hinarejos, C.D. (2018) The Vicomtech-PRHLT Speech Transcription Systems for the IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge. Proc. IberSPEECH 2018, 267-271, DOI: 10.21437/IberSPEECH.2018-56.


@inproceedings{Arzelus2018,
  author={Haritz Arzelus and Aitor Alvarez and Conrad Bernath and Eneritz García and Emilio Granell and Carlos David {Martinez Hinarejos}},
  title={{The Vicomtech-PRHLT Speech Transcription Systems for the IberSPEECH-RTVE 2018 Speech to Text Transcription Challenge}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={267--271},
  doi={10.21437/IberSPEECH.2018-56},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-56}
}