This paper considers the improvement of speaker identification performance in reverberant conditions using additional lip information. Automatic speaker identification (ASI) using speech characteristics alone can be highly successful, however problems occur with mismatches between training and testing conditions. In particular, we find that ASI performance drops dramatically when given anechoic training but reverberant test speech. Previous work [1][2] has shown that speaker dependant information can be extracted from the static and dynamic qualities of moving lips. Given that lip information is unaffected by reverberation, we choose to fuse this additional information with speech data. We propose a new method for estimating confidence levels to allow adaptive fusion of the audio and visual data. Identification results are presented for increasing levels of artificially reverberated data, where lip information is shown to provide excellent ASI performance improvement.
Cite as: Wark, T., Sridharan, S. (1998) Improving speaker identification performance in reverberant conditions using lip information. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0294, doi: 10.21437/ICSLP.1998-314
@inproceedings{wark98_icslp, author={Timothy Wark and Sridha Sridharan}, title={{Improving speaker identification performance in reverberant conditions using lip information}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0294}, doi={10.21437/ICSLP.1998-314} }