11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Applying Geometric Source Separation for Improved Pitch Extraction in Human-Robot Interaction

Martin Heckmann (1), Claudius Gläser (1), Frank Joublin (1), Kazuhiro Nakadai (2)

(1) Honda Research Institute Europe GmbH, Germany
(2) Honda Research Institute Japan Co. Ltd., Japan

We present a system for robust pitch extraction in noisy and echoic environments consisting of a multi-channel signal enhancement, a biologically inspired pitch extraction algorithm and a pitch tracking based on a Bayesian filter. The multi-channel signal enhancement deploys an 8 channel Geometric Source Separation (GSS). During pitch extraction we apply a Gammatone filter bank and then calculate a histogram of zero crossing distances based on the band-pass signals. While calculating the histogram spurious side peaks at harmonics and sub-harmonics of the true fundamental frequency are inhibited. The following grid based Bayesian tracker comprises a Bayesian filtering in a forward step and Bayesian smoothing in a backward step. We evaluate the system in a realistic human-robot interaction scenario with several male and female speakers. Hereby, we also include the comparison to two well established pitch extraction frameworks, i.e. get_f0 included in the WaveSurfer Toolkit and Praat.

Full Paper

Bibliographic reference.  Heckmann, Martin / Gläser, Claudius / Joublin, Frank / Nakadai, Kazuhiro (2010): "Applying geometric source separation for improved pitch extraction in human-robot interaction", In INTERSPEECH-2010, 2602-2605.