Sixth European Conference on Speech Communication and Technology
Multi-band speech recognition is an exploratory paradigm in which each frequency region is treated as a distinct source of information and the streams are combined after each is processed independently. A number of researchers have hypothesized that it is advantageous to combine the sub-frequency information in an asynchronous manner. This paper examines this hypothesis, using two different approaches in relaxing synchrony constraints: HMM decomposition/recombination  and two-level dynamic programming (DP) . Drawing on this work and those of others [2, 18], we conclude that relaxing the synchrony constraints indiscriminately for all phone-to-phone transitions does not consistently and significantly reduce the word error rate. The optimal permissible asynchrony must depend on both the phone-class transitions and the training-data statistics.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Mirghafori, Nikki / Morgan, Nelson (1999): "Sooner or later: exploring asynchrony in multi-band speech recognition", In EUROSPEECH'99, 595-598.