9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Fragmented Context-Dependent Syllable Acoustic Models

K. Thambiratnam, Frank Seide

Microsoft Research Asia, China

Though touted as an excellent candidate, past work has yet to demonstrate the value of the syllable for acoustic modeling. One reason is that critical factors such as context-dependency and model clustering are typically neglected in syllable works. This paper presents fragmented syllable models, a means to realize context-dependency for the syllable while constraining the implied explosion in training data requirements. Fragmented syllables only expose their head/tail phones as context, and thus limit the context space for triphone expansion. Furthermore, decision-tree clustering can be used to share data between parts, or fragments, of syllables, to better exploit training data for data-sparse syllables. The best resulting system achieves a 1.8% absolute (5.4% relative) reduction in WER over a baseline triphone acoustic model on a Switchboard-1 conversational telephone speech task.

Full Paper

Bibliographic reference.  Thambiratnam, K. / Seide, Frank (2008): "Fragmented context-dependent syllable acoustic models", In INTERSPEECH-2008, 2418-2421.