EUROSPEECH 2003 - INTERSPEECH 2003
Development of robust spoken language technology ideally relies on the availability of large amounts of data preferably in the target domain and language. However, more often than not, speech developers need to cope with very little or no data, typically obtained from a different target domain. This paper focuses on developing techniques towards addressing this challenge. Specifically we consider the case of developing a Persian language speech recognizer with sparse amounts of data. For language modeling, there are several potential sources of text data, e.g., available on the Internet, to help bootstrap initial models; however, acoustic data can be obtained only by tedious data collection efforts. The drawback of limited Persian acoustic data can be partially overcome by making use of acoustic data from languages that have vast resources such as English (and other languages, if available). The phoneme sets especially for diverse languages such as English and Persian differ considerably. However by incorporating knowledge-based as well as data-driven phoneme mappings, reliable Persian acoustic models can be trained using well-trained English models and small amounts of Persian re-training data. In our experiments Persian models re-trained from seed models created by data-driven phoneme mappings of English models resulted in a phoneme error rate of 19.80% as compared to a phoneme error rate of 20.35% when the Persian models were re-trained from seed models created by sparse Persian data.
Bibliographic reference. Srinivasamurthy, Naveen / Narayanan, Shrikanth (2003): "Language-adaptive persian speech recognition", In EUROSPEECH-2003, 3137-3140.