Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Phone Transition Acoustic Modeling: Application to Speaker Independent and Spontaneous Speech Systems

Jon P. Nedel, Rita Singh, Richard M. Stern

Department of Electrical and Computer Engineering and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

HMM-based large vocabulary speech recognition systems usually have a very large number of statistical parameters. For better estimation, the number of parameters is reduced by sharing them across models. The parameter sharing is decided by regression trees which are built using phonetic classes designed either by a human expert or by data-driven methods. In situations where neither of these are reliable, it may be useful to have techniques for non-decision-tree based state tying which perform comparably to those based on traditional methods. In this paper we propose two methods for non-decision tree based parameter learning in HMM-based systems. In the first method (context-dependent state tying), we restructure acoustic models to explicitly capture the transitions between phones in continuous speech. In the second method (transition-based subword units), we redefine the basic sound units used to model speech to model transitions between sounds explicitly. Experiments show that context-dependent state tying is a viable option for large vocabulary systems. They also show that using transition-based subword units can improve performance on spontaneous speech.

Full Paper

Bibliographic reference.  Nedel, Jon P. / Singh, Rita / Stern, Richard M. (2000): "Phone transition acoustic modeling: application to speaker independent and spontaneous speech systems", In ICSLP-2000, vol.4, 572-575.