EUROSPEECH 2003 - INTERSPEECH 2003
Choice of the phonetic units for speech recognizer is a factor greatly affecting the system performance. Phonetic units are normally defined according to the acoustic properties in parts of speech. Nevertheless, with the limit of training data, too delicate acoustic properties are ignored. Syllable structure is one of the properties usually ignored in English phonetic units due to the structure complexity. Some language like Chinese successfully gets the benefit from incorporating this property in the phonetic units, as the language itself is naturally syllabic and has only small amount of subsegments (onsets, nuclei, and codas). Thai, as some point between English and Chinese, has more subsegments than Chinese but not as much as English. There are two main steps in this paper. First, prove that Thai phonetic units can be defined as a set of subsegments without any data sparseness problem. Second, demonstrate that subsegmental phonetic units give better accuracy rate from integrating the syllable structure information and reduce a lot of number of triphone units because of left and right context constraints in the syllable structure.
Bibliographic reference. Kanokphara, Supphanat (2003): "Syllable structure based phonetic units for context-dependent continuous Thai speech recognition", In EUROSPEECH-2003, 797-800.