This paper proposes a speaker adaptation technique using nonlinear spectral transform based on GMMs. One of the most popular forms of speaker adaptation is based on linear transforms, e.g., MLLR. Although MLLR uses multiple transforms according to regression classes, only a single linear transform is applied to each state. The proposed method performs nonlinear speaker adaptation based on a new likelihood function combining HMMs for recognition with GMMs for spectral transform. Moreover, the context dependency of transforms can also be estimated in the integrated ML fashion. In phoneme recognition experiments, the proposed technique shows better performance than the conventional approaches.
Bibliographic reference. Hayashi, Toyohiro / Nankaku, Yoshihiko / Lee, Akinobu / Tokuda, Keiichi (2010): "Speaker adaptation based on nonlinear spectral transform for speech recognition", In INTERSPEECH-2010, 542-545.