11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Decision Tree State Clustering with Word and Syllable Features

Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan

Google, USA

In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others have proposed additional questions such as phone position within word or syllable. This paper examines using the word or syllable context itself as a feature in the decision tree, providing an elegant way of introducing word- or syllable-specific models into the system. Positive results are reported on two state-of-the-art systems: voicemail transcription and a search by voice tasks across a variety of acoustic model and training set sizes.

Full Paper

Bibliographic reference.  Liao, Hank / Alberti, Chris / Bacchiani, Michiel / Siohan, Olivier (2010): "Decision tree state clustering with word and syllable features", In INTERSPEECH-2010, 2958-2961.