Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools

Cassio Batista, Ana Larissa Dias, Nelson Sampaio Neto


Kaldi has become a very popular toolkit for automatic speech recognition, showing considerable improvements through the combination of hidden Markov models (HMM) and deep neural networks (DNN). However, in spite of its great performance for some languages (e.g. English, Italian, Serbian, etc.), the resources for Brazilian Portuguese (BP) are still quite limited. This work describes what appears to be the first attempt to create Kaldi-based scripts and baseline acoustic models for BP using Kaldi tools. Experiments were carried out for dictation tasks and a comparison to CMU Sphinx toolkit in terms of word error rate (WER) was performed. Results seem promising, since Kaldi achieved the absolute lowest WER of 4.75% with HMM-DNN and outperformed CMU Sphinx even when using Gaussian mixture models only.


 DOI: 10.21437/IberSPEECH.2018-17

Cite as: Batista, C., Dias, A.L., Sampaio Neto, N. (2018) Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools. Proc. IberSPEECH 2018, 77-81, DOI: 10.21437/IberSPEECH.2018-17.


@inproceedings{Batista2018,
  author={Cassio Batista and Ana Larissa Dias and Nelson {Sampaio Neto}},
  title={{Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={77--81},
  doi={10.21437/IberSPEECH.2018-17},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-17}
}