Machine Learning Powered Data Platform for High-Quality Speech and NLP Workflows

João Freitas, Jorge Ribeiro, Daan Baldwijns, Sara Oliveira, Daniela Braga


Machine learning (ML) models - like deep neural networks - require substantial amounts of training data. Also, the training dataset should be properly annotated to obtain satisfactory results. This paper describes a platform designed to create high-quality datasets. By using data workflows adapted for speech technologies and natural language processing systems, the user can collect and enrich speech and text data. Depending on the end goal, the data is passed through multiple processing steps based on human input and ML services. To guarantee data quality, the platform combines several mechanisms like language tests, real-time audits and user behavior into several ML models that act as quality gateways.


Cite as: Freitas, J., Ribeiro, J., Baldwijns, D., Oliveira, S., Braga, D. (2018) Machine Learning Powered Data Platform for High-Quality Speech and NLP Workflows. Proc. Interspeech 2018, 1962-1963.


@inproceedings{Freitas2018,
  author={João Freitas and Jorge Ribeiro and Daan Baldwijns and Sara Oliveira and Daniela Braga},
  title={Machine Learning Powered Data Platform for High-Quality Speech and NLP Workflows},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1962--1963}
}