EpaDB: A Database for Development of Pronunciation Assessment Systems

Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla


In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.


 DOI: 10.21437/Interspeech.2019-1839

Cite as: Vidal, J., Ferrer, L., Brambilla, L. (2019) EpaDB: A Database for Development of Pronunciation Assessment Systems. Proc. Interspeech 2019, 589-593, DOI: 10.21437/Interspeech.2019-1839.


@inproceedings{Vidal2019,
  author={Jazmín Vidal and Luciana Ferrer and Leonardo Brambilla},
  title={{EpaDB: A Database for Development of Pronunciation Assessment Systems}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={589--593},
  doi={10.21437/Interspeech.2019-1839},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1839}
}