![]() |
ITRW on
|
![]() |
We investigate the use of voicing in state-of-the-art Large Vocabulary Continuous Audio-visual automatic Speech Recognition (AV-LVCSR). In this work we apply an original adaptive weighting function using voicing level to estimate the appropriate combination weights for each of the modalities. We show that we can improve the state-of-the-art AV-LVCSR performance under speech noise by using a detector of the dominant speaker which is a function of the voicing level. We re- fine the weighting function according to sensibility and speci- ficity of the dominant speaker detector. In this first experiment, weighting functions are threshold functions of the voicing level. Rather than testing all possible thresholds, three of them are arbitrarily chosen so that the sensitivity, or specificity of the detector, reaches 95%, or so that sensitivity and specificity are equal. Results show that the AV-LVCSR system we use is improved by 5.7% using a weighing function with high sensibility to dominant speaker activity.
Bibliographic reference. Glotin, Hervé (2001): "Dominant speaker detection based on voicing for adaptive audio-visual ASR robust to speech noise", In Adaptation-2001, 89-92.