ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A combined adaptive and decision tree based speech separation technique for telemedicine applications

Yunxin Zhao, Xiao Zhang, Xiaodong He, Laura Schopp

We present a novel technique for separation of doctor and patient’s speech in conversations over a telemedicine network. The mixed speech signals acquired at doctor’s site is first broken into single talkers’ speech segments and background by using thresholds of energy and duration. The speech segments are then identified as spoken by doctor or patient in two steps. In the first step, Gaussian mixture models (GMM) of doctor and patient are used, where the doctor’s model is obtained from his/her training speech, and the patient’s model is initialized by a general speaker model and then adapted by the patient’s speech. In the second step, a decision tree that uses contextual and confidence features is applied to refine the identification results. Preliminary experiments were performed on three data sets collected in telemedicine. Without adaptation and decision tree, error rates at the segment-level and frame-level were 25.44% and 16.53%, respectively. With adaptation, segment and frame error rates were reduced to 13.11% and 7.85%, and with decision tree, the error rates were further reduced to 10.48% and 6.73%, respectively.


Cite as: Zhao, Y., Zhang, X., He, X., Schopp, L. (2000) A combined adaptive and decision tree based speech separation technique for telemedicine applications. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 795-798

@inproceedings{zhao00_icslp,
  author={Yunxin Zhao and Xiao Zhang and Xiaodong He and Laura Schopp},
  title={{A combined adaptive and decision tree based speech separation technique for telemedicine applications}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 795-798}
}