Sixth International Conference on Spoken Language Processing
October 16-20, 2000
A Combined Adaptive and Decision Tree Based Speech Separation Technique for Telemedicine Applications
Yunxin Zhao, Xiao Zhang, Xiaodong He, Laura Schopp (1)
Dept. of Computer Engineering & Computer Science,
(1) Dept. of Physical Medicine & Rehabilitation,
University of Missouri, Columbia, MO, USA
We present a novel technique for separation of doctor
and patientís speech in conversations over a
telemedicine network. The mixed speech signals
acquired at doctorís site is first broken into single
talkersí speech segments and background by using
thresholds of energy and duration. The speech segments
are then identified as spoken by doctor or patient in two
steps. In the first step, Gaussian mixture models
(GMM) of doctor and patient are used, where the
doctorís model is obtained from his/her training speech,
and the patientís model is initialized by a general
speaker model and then adapted by the patientís speech.
In the second step, a decision tree that uses contextual
and confidence features is applied to refine the
identification results. Preliminary experiments were
performed on three data sets collected in telemedicine.
Without adaptation and decision tree, error rates at the
segment-level and frame-level were 25.44% and
16.53%, respectively. With adaptation, segment and
frame error rates were reduced to 13.11% and 7.85%,
and with decision tree, the error rates were further
reduced to 10.48% and 6.73%, respectively.
Zhao, Yunxin / Zhang, Xiao / He, Xiaodong / Schopp, Laura (2000):
"A combined adaptive and decision tree based speech separation technique for telemedicine applications",
In ICSLP-2000, vol.2, 795-798.