This paper presents a talker's head orientation estimation method using only a single microphone, where phoneme HMMs (Hidden Markov Models) of clean speech are introduced to separate the acoustic transfer function at the user's position and head orientation. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position with a given head orientation. Using the separated frame sequence data, the user's position and the head orientation are trained by Support Vector Machine (SVM) in advance. Then, for each test utterance, the frame sequence of the acoustic transfer function is separated based on the maximum likelihood estimation using the label sequence obtained from the phoneme recognition, and the user's position and head orientation are estimated by discriminating the separated acoustic transfer function using SVM. The effectiveness of this method has been confirmed by talker localization and head orientation estimation experiments performed in a real environment.
Bibliographic reference. Takashima, Ryoichi / Takiguchi, Tetsuya / Ariki, Yasuo (2011): "Single-channel head orientation estimation based on discrimination of acoustic transfer function", In INTERSPEECH-2011, 2721-2724.