Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Sooner or Later: Exploring Asynchrony in Multi-Band Speech Recognition

Nikki Mirghafori (1,2,3), Nelson Morgan (1,2)

(1) International Computer Science Institute, Berkeley, CA, USA
(2) University of California at Berkeley, EECS Department, Berkeley, CA, USA
(3) Nuance Communications, Menlo Park, CA, USA

Multi-band speech recognition is an exploratory paradigm in which each frequency region is treated as a distinct source of information and the streams are combined after each is processed independently. A number of researchers have hypothesized that it is advantageous to combine the sub-frequency information in an asynchronous manner. This paper examines this hypothesis, using two different approaches in relaxing synchrony constraints: HMM decomposition/recombination [19] and two-level dynamic programming (DP) [16]. Drawing on this work and those of others [2, 18], we conclude that relaxing the synchrony constraints indiscriminately for all phone-to-phone transitions does not consistently and significantly reduce the word error rate. The optimal permissible asynchrony must depend on both the phone-class transitions and the training-data statistics.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Mirghafori, Nikki / Morgan, Nelson (1999): "Sooner or later: exploring asynchrony in multi-band speech recognition", In EUROSPEECH'99, 595-598.