September 22-25, 1997
Recently there has been much work done on how to transform HMMs, trained typically in a speaker-independent fashion on clean training data, to be more representative of data from a particular speaker or acoustic environment. These transforms are trained on a small amount of training data, so large numbers of components are required to share the same transform. Normally, each component is constrained to only use one transform. This paper examines how to optimally, in a maximum likelihood sense, assign components to transforms and allow each component, or component grouping, to make use of many transformations. The theory for obtaining both "weights" for each transform and transforms given a set of weights is given. The techniques are evaluated on both speaker and environmental adaptation tasks.
Bibliographic reference. Gales, M. J. F. (1997): "Transformation smoothing for speaker and environmental adaptation", In EUROSPEECH-1997, 2067-2070.