In this paper, we describe a method to derive a phonetic pronunciation of a word using only an acoustic utterance of that word without a priori knowledge of the spelling of the word. In [5] and [6], we used a pronunciation model based on bigram statistics. Bi-gram statistics only constrain the left neighbor phone and results in phone sequences that are only pairwise appropriate. Here, we apply a pronunciation model in combination with a phonotactic model that serves the purpose of a language model to constrain the phone sequences produced. Error rates with and without the phonotactic model are presented.
Cite as: Ramabhadran, B., Deligne, S., Ittycheriah, A. (1999) Acoustics-based baseform generation with pronunciation and/or phonotactic models. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 507-510, doi: 10.21437/Eurospeech.1999-130
@inproceedings{ramabhadran99_eurospeech, author={Bhuvana Ramabhadran and Sabine Deligne and Abraham Ittycheriah}, title={{Acoustics-based baseform generation with pronunciation and/or phonotactic models}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={507--510}, doi={10.21437/Eurospeech.1999-130} }