Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Combined Adaptive and Decision Tree Based Speech Separation Technique for Telemedicine Applications

Yunxin Zhao, Xiao Zhang, Xiaodong He, Laura Schopp (1)

Dept. of Computer Engineering & Computer Science, (1) Dept. of Physical Medicine & Rehabilitation, University of Missouri, Columbia, MO, USA

We present a novel technique for separation of doctor and patientís speech in conversations over a telemedicine network. The mixed speech signals acquired at doctorís site is first broken into single talkersí speech segments and background by using thresholds of energy and duration. The speech segments are then identified as spoken by doctor or patient in two steps. In the first step, Gaussian mixture models (GMM) of doctor and patient are used, where the doctorís model is obtained from his/her training speech, and the patientís model is initialized by a general speaker model and then adapted by the patientís speech. In the second step, a decision tree that uses contextual and confidence features is applied to refine the identification results. Preliminary experiments were performed on three data sets collected in telemedicine. Without adaptation and decision tree, error rates at the segment-level and frame-level were 25.44% and 16.53%, respectively. With adaptation, segment and frame error rates were reduced to 13.11% and 7.85%, and with decision tree, the error rates were further reduced to 10.48% and 6.73%, respectively.

Full Paper

Bibliographic reference.  Zhao, Yunxin / Zhang, Xiao / He, Xiaodong / Schopp, Laura (2000): "A combined adaptive and decision tree based speech separation technique for telemedicine applications", In ICSLP-2000, vol.2, 795-798.