12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Separating Speaker and Environmental Variability Using Factored Transforms

Michael L. Seltzer, Alex Acero

Microsoft Research, USA

Two primary sources of variability that degrade accuracy in speech recognition systems are the speaker and the environment. While many algorithms for speaker or environment adaptation have been proposed to improve performance, far less attention has been paid to approaches which address for both factors. In this paper, we present a method for compensating for speaker and environmental mismatch using a cascade of CMLLR transforms. The proposed approach enables speaker transforms estimated in one environment to be effectively applied to speech from the same user in a different environment. This approach can be further improved using a new training method called speaker and environment adaptive training method. When applying speaker transforms to new environments, the proposed approach results in a 13% relative improvement over conventional CMLLR.

Full Paper

Bibliographic reference.  Seltzer, Michael L. / Acero, Alex (2011): "Separating speaker and environmental variability using factored transforms", In INTERSPEECH-2011, 1097-1100.