Building the Singapore English National Speech Corpus

Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan


The National Speech Corpus (NSC) is the first large-scale Singapore English corpus spearheaded by the Info-communications and Media Development Authority of Singapore. It aims to become an important source of open speech data for automatic speech recognition (ASR) research and speech-related applications. The first release of the corpus features more than 2000 hours of orthographically transcribed read speech data designed with the inclusion of locally relevant words. It is available for public and commercial use upon request at “www.imda.gov.sg/nationalspeechcorpus”, under the Singapore Open Data License. An accompanying lexicon is currently in the works and will be published soon. In addition, another 1000 hours of conversational speech data will be made available in the near future under the second release of NSC. This paper reports on the development and collection process of the read speech and conversational speech corpora.


 DOI: 10.21437/Interspeech.2019-1525

Cite as: Koh, J.X., Mislan, A., Khoo, K., Ang, B., Ang, W., Ng, C., Tan, Y. (2019) Building the Singapore English National Speech Corpus. Proc. Interspeech 2019, 321-325, DOI: 10.21437/Interspeech.2019-1525.


@inproceedings{Koh2019,
  author={Jia Xin Koh and Aqilah Mislan and Kevin Khoo and Brian Ang and Wilson Ang and Charmaine Ng and Ying-Ying Tan},
  title={{Building the Singapore English National Speech Corpus}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={321--325},
  doi={10.21437/Interspeech.2019-1525},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1525}
}