ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

DySANA: dynamic speech and noise adaptation for voice activity detection

Ron J. Weiss, Trausti Kristjansson

We describe a method of simultaneously tracking noise and speech levels for signal-to-noise ratio adaptive speech endpoint detection. The method is based on the Kalman filter framework with switching observations and uses a dynamic distribution that 1) limits the rate of change of these levels 2) enforces a range on the values for the two levels and 3) enforces a ratio between the noise and the signal levels. We call this a Lombard dynamic distribution since it encodes the expectation that a speaker will increase his or her vocal intensity in noise. The method also employs a state transition matrix which encodes a prior on the states and provides a continuity constraint. The new method provides 46.1% relative improvement in WER over a baseline GMM based endpointer at 20 dB SNR.


doi: 10.21437/Interspeech.2008-29

Cite as: Weiss, R.J., Kristjansson, T. (2008) DySANA: dynamic speech and noise adaptation for voice activity detection. Proc. Interspeech 2008, 127-130, doi: 10.21437/Interspeech.2008-29

@inproceedings{weiss08_interspeech,
  author={Ron J. Weiss and Trausti Kristjansson},
  title={{DySANA: dynamic speech and noise adaptation for voice activity detection}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={127--130},
  doi={10.21437/Interspeech.2008-29}
}