8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Multilingual Phone Clustering for Recognition of Spontaneous Indonesian Speech Utilising Pronunciation Modelling Techniques

Eddie Wong (1), Terrence Martin (1), Torbjorn Svendsen (2), Sridha Sridharan (1)

(1) Queensland University of Technology, Australia
(2) Norwegian University of Science and Technology, Norway

In this paper, a multilingual acoustic model set derived from English, Hindi, and Spanish is utilised to recognise speech in Indonesian. In order to achieve this task we incorporate a two tiered approach to perform the cross-lingual porting of the multilingual models to a new language. In the first stage, we use an entropy based decision tree to merge similar phones from different languages into clusters to form a new multilingual model set. In the second stage, we propose the use of a cross-lingual pronunciation modelling technique to perform the mapping from the multilingual models to the Indonesian phone set. A set of mapping rules are derived from this process and are employed to convert the original Indonesian lexicon into a pronunciation lexicon in terms of the multilingual model set. Preliminary experimental results show that, compared to the common knowledge based approach, both of these techniques reduce the word error rate in a spontaneous speech recognition task.

Full Paper

Bibliographic reference.  Wong, Eddie / Martin, Terrence / Svendsen, Torbjorn / Sridharan, Sridha (2003): "Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques", In EUROSPEECH-2003, 3133-3136.