In this paper, a novel technique of online incremental speaker adaptation for speech stream separation in telemedicine is proposed. An unsupervised discriminative linear regression technique is developed based on the principle of maximizing the class separation margin to transform model mean. This adaptation approach is called largest margin linear regression (LMLR). Online incremental LMLR and MAP are performed on Gaussian mixture density based speaker models. A discounted sequential learning technique is proposed for LMLR to reduce effect of unreliable initial models on unsupervised speaker model adaptation, and the adapted models from LMLR and MAP are combined for improving accuracy of speech segment labeling as doctor or patient. Experimental results on telemedicine data show that LMLR is superior to MLLR and combining LMLR and MAP during online model adaptation is highly effective. The proposed new technique significantly improved performance of our earlier system of speech stream separation, leading to nearly perfectly separated speech streams when judged by human listeners.
Cite as: Hu, R., Xue, J., Zhao, Y. (2005) Incremental largest margin linear regression and MAP adaptation for speech separation in telemedicine applications. Proc. Interspeech 2005, 261-264, doi: 10.21437/Interspeech.2005-153
@inproceedings{hu05_interspeech, author={Rusheng Hu and Jian Xue and Yunxin Zhao}, title={{Incremental largest margin linear regression and MAP adaptation for speech separation in telemedicine applications}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={261--264}, doi={10.21437/Interspeech.2005-153} }