International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Hands-Free Telecommunications

B. H. Juang and F. K. Soong

Bell Labs, Lucent Technologies, Murray Hill, NJ, USA

Introduction Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode, in which the participants speak and interact with each other over a communication network, without baving to wear or hold any special device such as a microphone or a headset. Hands-free telecommunication generally takes place in an enclosed room, such as an office or tbe cabin of a car. The room is equipped with a transducer assembly (a microphone or an array of microphones), which picks up the acoustic signal in the room, and a loudspeaker, which plays out tbe signal from The remote end. In such a communication configuration, the talker in tbe room usually is situated at a certain comfortable distance from either the microphone assembly or the loudspeaker. This communication scenario is quite different from the traditional telephony in which a telephone with a handset is regularly used. A relevant device called Speakerphone that exists today may allow a primitive form of hands-free telecommunication.

There are a number of strong motivations behind bands-free telecommunication. First, people want mobility, even just locally. Communication using a tethered device is both inconvenient and undesirable. The use of a speakerphone is exactly to let the user do away with a locally tetbered device. Second, in mobile communications, the concern over safety is growing; many municipalities and countries in the world are erecting legislation to disallow a driver to use a hand-held cellular phone while driving. A hands-free communication device installed in the car would alleviate tbe distraction of a band-held callular phone. Third, people are constantly looking for improved communication quality and enhanced naturalness in interface to communication services. A speakerphone that uses "gain switching" to enable a limited full-duplex conversation cannot deliver high speech quality to allow proper use of a remote speech recognizer or to support multi-point teleconferencing without causing frustration on the participants. Many signal processing problems need to be solved to be able to realize high-quality hands-free telecommunications.

A number of critical technical issues are involved in the hands-free telecommunication paradigm. The acoustic signal picked up by the microphone includes 1) speech and the acoustic background of the far-end played out from the loudspeaker (which we refer to as "echo" or "system echo" to distinguish it from reverberation below), 2) the near-end talker's speech, and 3) a substantial amount of tbe near-end ambient noise, including speech from unauthorized or unintended talkers. Unlike a handset, which responds primarily to the talker holding the device, the microphone used in hands-free communication receives a multiplicity of sound sources. For most of the current automatic speech recognition Systems designed to respond to a single talker's speecb, this is a major source of degradation in performance. In addition, since the talker and the microphone are not expected to stay at a fixed relative position, the sound quality in the system will certainly vary.

The acoustic signal entering the system is a strong function of the room as well as the type of microphone that the system uses. The room inflicts two essential effects on the acoustic signal, one called colorization and the other reverberation. Colorization refers to the change of short-time spectral shape the room or the microphone causes on the source signal. The effective length of the impulse response of a room is a function of the room configuration (geometric sbape of the room and acoustic reflectivity of the wall). When the impulse response is more than a few tenths of a second long, the reverberation effect becomes noticeably disturbing. Both a human listener and a speech recognizer would react negatively to reverberation.

Another important issue is the support of natural cornrnunication interaction, such as full-duplex, whicb allows both ends to speak at tbe same time without disruption, and barge-in in speech recognition. With bands-free communication, participants expect and are expected to behavc as if a face-to-face conversation is taking place. Thus, interruptions, barge-ins, and even sidebars would occur more frequently than when using a telephone handset. Technical implications due to these behaviors are tbe need for an improved speech activity detector, and a reliable echo canceler to allow full-duplex communication and proper barge-in for natural human-machine interaction. With the recognition accuracy of today's speech recognizers, the integration of an echo canceler into the system for natural interaction is still an interesting challenge. These issues, namely, multiple sound sources, echo and reverberation, and naturalness in interactive behaviors, make hands-free telecommunication one of tbe most intriguing engineering problems in modern days.

Here, we review the progress towards hands-free telecommunication in tbe past two decades, particularly from tbe viewpoint of signal restoration, and to draw new teclinical dimensions, whicb may stimulate addifional advances. We address the issues in terms of tbe noise problem, the duplex problem, tbe colorization problem and tbe reverberation problem. For each problem area, we higblight the significance of key advances and point out new technical directions. We conclude the presentation with a discussion of The natural interaction problem according to our experience in designing a hands-free voice user interface in automobiles.


Full Paper

Bibliographic reference.  Juang, B. H. / Soong, F. K. (2001): "Hands-free telecommunications", In HSC2001, 5-10.