Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Asynchrony with Trained Transition Probabilities Improves Performance in Multi-Band Speech Recognition

Brian Mak, Yik-Cheung Tam

Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

One of the central themes in multi-band automatic speech recognition (ASR) is to devise a strategy for recombining sub-band information. This in turn raises two questions: (1) at what phonetic unit should the recombination take place? (2) How asynchronously should the sub-bands be run? Theoretically asynchronous multi-band ASR should perform at least as well as synchronous multi-band ASR. However, in the past few years, there are conflicting results on the issue. In this paper, we study the asynchrony issue under the framework of HMM composition in which a model-based recombination strategy is used to recombine sub-band HMMs at the state level. We hypothesize that re-estimation of the transition probabilities is crucial for multi-band ASR (using HMM composition). Experiments on connected TI digits show that for both clean speech and noisy speech (with additive white noise of 10db), HMMs composed from sub-band HMMs in which transition probabilities are trained with Baum-Welch algorithm outperform those in which transition probabilities are set uniformly (e.g. 0.5 in common left-to-right HMMs) by about 20%. Recombining sub-bands with a maximum asynchrony limit of one state gives a further  15% improvement over synchronous recombination on both clean speech and noisy speech. Finally relaxing asynchrony to more than one state results in worse performance.


Full Paper

Bibliographic reference.  Mak, Brian / Tam, Yik-Cheung (2000): "Asynchrony with trained transition probabilities improves performance in multi-band speech recognition", In ICSLP-2000, vol.4, 149-152.