EUROSPEECH 2003  INTERSPEECH 2003

Adaptation to a new speaker or environment is becoming very important as speech recognition systems are deployed in unpredictable real world situations. Constrained or Feature space Maximum Likelihood Regression (fMLLR) [1] has proved to be especially effective for this purpose, particularly when used for incremental unsupervised adaptation [2]. Unfortunately the standard implementation described in [1] and used by most authors since, requires statistics that require O(n^3) operations to collect per frame. In addition the statistics require O(n^3) space for storage and the estimation of the feature transform matrix requires O(n^4) operations. This is an unacceptable cost for most embedded speech recognition systems. In this paper we show the fMLLR objective function can be optimized using stochastic gradient descent in a way that achieves almost the same results as the standard implementation. All this is accomplished with an algorithm that requires only O(n^2) operations per frame and O(n^2) storage requirements. This order of magnitude savings allows continuous adaptation to be implemented in most resource constrained embedded speech recognition applications.
Bibliographic reference. Balakrishnan, Sreeram V. (2003): "Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent", In EUROSPEECH2003, 15211524.