INTERSPEECH 2004 - ICSLP
This paper examines how any parameter in the typical front end of a speech recognizer, can be rapidly and inexpensively adapted with usage. It focusses on firstly demonstrating that effective adaptation can be accomplished using low CPU/Memory cost stochastic gradient descent methods, secondly showing that adaptation can be done at time scales small enough to make it effective with just a single utterance, and lastly showing that using a prior on the parameter significantly improves adaptation performance on small amounts of data. It extends previous work on stochastic gradient descent implementation of fMLLR and work on adapting any parameter in the front-end chain using general 2nd order opimization techniques. The framework for general stochastic gradient descent of any frontend parameter with a prior is presented, along with practical techniques to improve convergence. In addition the methods for obtaining the alignment at small time intervals before the end of the utterance are presented. Finally it shown that experimentally online causal adaptation can result in a 5-15% WER reduction across a variety of problems sets and noise conditions, even with just 1 or 2 utterances of adaptation data.
Bibliographic reference. Balakrishnan, Sreeram / Visweswariah, Karthik / Goe, Vaibhava (2004): "Stochastic gradient adaptation of front-end parameters", In INTERSPEECH-2004, 1-4.