In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others have proposed additional questions such as phone position within word or syllable. This paper examines using the word or syllable context itself as a feature in the decision tree, providing an elegant way of introducing word- or syllable-specific models into the system. Positive results are reported on two state-of-the-art systems: voicemail transcription and a search by voice tasks across a variety of acoustic model and training set sizes.
Bibliographic reference. Liao, Hank / Alberti, Chris / Bacchiani, Michiel / Siohan, Olivier (2010): "Decision tree state clustering with word and syllable features", In INTERSPEECH-2010, 2958-2961.