ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Fragmented context-dependent syllable acoustic models

K. Thambiratnam, Frank Seide

Though touted as an excellent candidate, past work has yet to demonstrate the value of the syllable for acoustic modeling. One reason is that critical factors such as context-dependency and model clustering are typically neglected in syllable works. This paper presents fragmented syllable models, a means to realize context-dependency for the syllable while constraining the implied explosion in training data requirements. Fragmented syllables only expose their head/tail phones as context, and thus limit the context space for triphone expansion. Furthermore, decision-tree clustering can be used to share data between parts, or fragments, of syllables, to better exploit training data for data-sparse syllables. The best resulting system achieves a 1.8% absolute (5.4% relative) reduction in WER over a baseline triphone acoustic model on a Switchboard-1 conversational telephone speech task.


doi: 10.21437/Interspeech.2008-133

Cite as: Thambiratnam, K., Seide, F. (2008) Fragmented context-dependent syllable acoustic models. Proc. Interspeech 2008, 2418-2421, doi: 10.21437/Interspeech.2008-133

@inproceedings{thambiratnam08_interspeech,
  author={K. Thambiratnam and Frank Seide},
  title={{Fragmented context-dependent syllable acoustic models}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2418--2421},
  doi={10.21437/Interspeech.2008-133}
}