Sixth International Conference on Spoken Language Processing
One of the central themes in multi-band automatic speech recognition (ASR) is to devise a strategy for recombining sub-band information. This in turn raises two questions: (1) at what phonetic unit should the recombination take place? (2) How asynchronously should the sub-bands be run? Theoretically asynchronous multi-band ASR should perform at least as well as synchronous multi-band ASR. However, in the past few years, there are conflicting results on the issue. In this paper, we study the asynchrony issue under the framework of HMM composition in which a model-based recombination strategy is used to recombine sub-band HMMs at the state level. We hypothesize that re-estimation of the transition probabilities is crucial for multi-band ASR (using HMM composition). Experiments on connected TI digits show that for both clean speech and noisy speech (with additive white noise of 10db), HMMs composed from sub-band HMMs in which transition probabilities are trained with Baum-Welch algorithm outperform those in which transition probabilities are set uniformly (e.g. 0.5 in common left-to-right HMMs) by about 20%. Recombining sub-bands with a maximum asynchrony limit of one state gives a further 15% improvement over synchronous recombination on both clean speech and noisy speech. Finally relaxing asynchrony to more than one state results in worse performance.
Bibliographic reference. Mak, Brian / Tam, Yik-Cheung (2000): "Asynchrony with trained transition probabilities improves performance in multi-band speech recognition", In ICSLP-2000, vol.4, 149-152.