EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Fast Incremental Adaptation Using Maximum Likelihood Regression and Stochastic Gradient Descent

Sreeram V. Balakrishnan

IBM T.J. Watson Research Center, USA

Adaptation to a new speaker or environment is becoming very important as speech recognition systems are deployed in unpredictable real world situations. Constrained or Feature space Maximum Likelihood Regression (fMLLR) [1] has proved to be especially effective for this purpose, particularly when used for incremental unsupervised adaptation [2]. Unfortunately the standard implementation described in [1] and used by most authors since, requires statistics that require O(n^3) operations to collect per frame. In addition the statistics require O(n^3) space for storage and the estimation of the feature transform matrix requires O(n^4) operations. This is an unacceptable cost for most embedded speech recognition systems. In this paper we show the fMLLR objective function can be optimized using stochastic gradient descent in a way that achieves almost the same results as the standard implementation. All this is accomplished with an algorithm that requires only O(n^2) operations per frame and O(n^2) storage requirements. This order of magnitude savings allows continuous adaptation to be implemented in most resource constrained embedded speech recognition applications.

Full Paper

Bibliographic reference.  Balakrishnan, Sreeram V. (2003): "Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent", In EUROSPEECH-2003, 1521-1524.