Sixth International Conference on Spoken Language Processing
HMM-based large vocabulary speech recognition systems usually have a very large number of statistical parameters. For better estimation, the number of parameters is reduced by sharing them across models. The parameter sharing is decided by regression trees which are built using phonetic classes designed either by a human expert or by data-driven methods. In situations where neither of these are reliable, it may be useful to have techniques for non-decision-tree based state tying which perform comparably to those based on traditional methods. In this paper we propose two methods for non-decision tree based parameter learning in HMM-based systems. In the first method (context-dependent state tying), we restructure acoustic models to explicitly capture the transitions between phones in continuous speech. In the second method (transition-based subword units), we redefine the basic sound units used to model speech to model transitions between sounds explicitly. Experiments show that context-dependent state tying is a viable option for large vocabulary systems. They also show that using transition-based subword units can improve performance on spontaneous speech.
Bibliographic reference. Nedel, Jon P. / Singh, Rita / Stern, Richard M. (2000): "Phone transition acoustic modeling: application to speaker independent and spontaneous speech systems", In ICSLP-2000, vol.4, 572-575.