ISCA Archive SLTU 2014
ISCA Archive SLTU 2014

Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks

Sakriani Sakti, Satoshi Nakamura

With the advent of globalization, multilingualism in Indonesia gradually faces a state of catastrophe. Currently among 726 ethnic languages spoken in Indonesian archipelago, 146 are endangered. Several projects have been initiated for cultural preservation which can prevent the endangered language from being lost. Nevertheless, the available technology that could support communication within indigenous communities, as well as with people outside the community, is still very rare in Indonesia. Speech translation technology is one of the technologies that may help indigenous communities in Indonesia to overcome language barrier and cross cultural gap as well as to face globalization. Our long-term goal is to establish an infrastructure of speech translation system from ethnic languages to English/Indonesian, and this paper presents recent progress of data resources collection and speech recognition system development for four Indonesian major ethnic languages: Javanese, Sundanese, Balinese and Bataks.

Index Terms: Language preservation, Indonesian ethnic languages, speech data collection, speech recognition system.


Cite as: Sakti, S., Nakamura, S. (2014) Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks. Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2014), 46-52

@inproceedings{sakti14_sltu,
  author={Sakriani Sakti and Satoshi Nakamura},
  title={{Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks}},
  year=2014,
  booktitle={Proc. 4th Workshop on Spoken Language Technologies for Under-Resourced Languages  (SLTU 2014)},
  pages={46--52}
}