ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Combining multiple-sized sub-word units in a speech recognition system using baseform selection

T. Nagarajan, P. Vijayalakshmi, Douglas O'Shaughnessy

A Longer-sized sub-word unit is known to be a better candidate in the development of a continuous speech recognition system. However, the basic problem with such units is the data sparsity. To overcome this problem, researchers have tried to combine longer-sized sub-word unit models with phoneme models. In this paper, we have considered only frequently occurring syllables and VC (Vowel + Consonant) units, and phone-sized units (monophones and triphones) for the development of a continuous speech recognition system. In such a case, even for a single pronunciation of a word, there can be multiple representational baseforms in the lexicon, each with different-sized units. We show that a considerable improvement in recognition performance can be achieved if the baseforms are selected properly. Out of all possible baseforms for a given word in the lexicon, the baseform that maximizes the acoustic likelihood, for possible sub-word unit concatenations to make a word, alone is considered. In the baseline systems' word-lexicon, like pure monophone or triphone-based systems, since only the acoustically weaker baseforms are replaced by baseforms with longer-sized units, the resultant performance is guaranteed to be better than that of baseline systems. The preliminary experiments carried out on the TIMIT speech corpus show a considerable improvement in the recognition performance over a pure monophone/triphone-based systems when the larger-sized units are combined using proper selection of baseforms.


doi: 10.21437/Interspeech.2006-446

Cite as: Nagarajan, T., Vijayalakshmi, P., O'Shaughnessy, D. (2006) Combining multiple-sized sub-word units in a speech recognition system using baseform selection. Proc. Interspeech 2006, paper 1280-Wed1BuP.12, doi: 10.21437/Interspeech.2006-446

@inproceedings{nagarajan06b_interspeech,
  author={T. Nagarajan and P. Vijayalakshmi and Douglas O'Shaughnessy},
  title={{Combining multiple-sized sub-word units in a speech recognition system using baseform selection}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1280-Wed1BuP.12},
  doi={10.21437/Interspeech.2006-446}
}