Transliteration converts words in a source language (e.g., English) into phonetically equivalent words in a target language (e.g., Vietnamese). Transliteration is therefore used to handle out-of-vocabulary (OOV) words adopted from foreign languages in automatic speech recognition and keyword search systems. While statistical transliteration approaches have been widely adopted, they may not always be suitable for under-resourced languages, where training data is scarce. In this work, we present a rule-based Vietnamese transliteration framework suitable for spoken language applications with minimal linguistic resources. We show that the proposed system outperforms statistical baselines by up to 81.70% relative when there is limited training examples (94 word pairs). In addition, we investigate the trade-off between training corpus size and transliteration performance of different methods on two distinct corpora. We also show that the proposed model outperforms statistical baselines up to 36.76% relative in keyword search tasks.
Bibliographic reference. Ngo, Hoang Gia / Chen, Nancy F. / Sivadas, Sunil / Ma, Bin / Li, Haizhou (2014): "A minimal-resource transliteration framework for vietnamese", In INTERSPEECH-2014, 1410-1414.