We present a system for robust pitch extraction in noisy and echoic environments consisting of a multi-channel signal enhancement, a biologically inspired pitch extraction algorithm and a pitch tracking based on a Bayesian filter. The multi-channel signal enhancement deploys an 8 channel Geometric Source Separation (GSS). During pitch extraction we apply a Gammatone filter bank and then calculate a histogram of zero crossing distances based on the band-pass signals. While calculating the histogram spurious side peaks at harmonics and sub-harmonics of the true fundamental frequency are inhibited. The following grid based Bayesian tracker comprises a Bayesian filtering in a forward step and Bayesian smoothing in a backward step. We evaluate the system in a realistic human-robot interaction scenario with several male and female speakers. Hereby, we also include the comparison to two well established pitch extraction frameworks, i.e. get_f0 included in the WaveSurfer Toolkit and Praat.
Bibliographic reference. Heckmann, Martin / Gläser, Claudius / Joublin, Frank / Nakadai, Kazuhiro (2010): "Applying geometric source separation for improved pitch extraction in human-robot interaction", In INTERSPEECH-2010, 2602-2605.