The triphone model is frequently used as an acoustic model. It is effective for modeling phonetic variations caused by coarticulation. However, it is known that acoustic features of phonemes are also affected by other factors such as speaking style and speaking speed. In this paper, a new acoustic model is proposed. All training data which have the same phoneme context are automatically clustered into several clusters based on acoustic similarity, and a “sub-triphones” is trained using training data corresponding to a cluster.
In experiments, the sub-triphone model achieved about 5% higher phoneme accuracy than the triphone model.
Bibliographic reference. Suzuki, Motoyuki / Honma, Daisuke / Ito, Akinori / Makino, Shozo (2009): "Detailed description of triphone model using SSS-free algorithm", In INTERSPEECH-2009, 1399-1402.