In multi-party teleconferencing, the transport of separate speech streams to a particular user and the subsequent spatial rendering of the different streams enables a more efficient communication. A simple means of spatial presentation at client side is that of binaural rendering and headphone presentation. For downward-compatibility, e.g. when the transport mechanism does not support multiple parallel downlink streams, a system is proposed that combines an automatic speaker classification mechanism with a spatial rendering of the segregated streams. The combined system aims at a better separability of the speakers than conventional systems. The paper details the two basic components, namely automatic speaker classification, and binaural rendering. Based on a first evaluation of the approach, a proof of concept is provided, and directions for further improvement are discussed.
Bibliographic reference. Raake, Alexander / Spors, Sascha / Ahrens, Jens / Ajmera, Jitendra (2007): "Concept and evaluation of a downward-compatible system for spatial teleconferencing using automatic speaker clustering", In INTERSPEECH-2007, 1693-1696.