Detection of voiced speech and estimation of the pitch frequency are
important tasks for many speech processing algorithms. Pitch information
can be used, e.g., to reconstruct voiced speech corrupted by noise.
In automotive environments, driving noise especially affects voiced
speech portions in the lower frequencies. Pitch estimation is therefore
important, e.g., for in-car-communication systems. Such systems amplify
the driver’s voice and allow for convenient conversations with
backseat passengers. Low latency is required for this application,
which requires the use of short window lengths and short frame shifts
between consecutive frames. Conventional pitch estimation techniques,
however, rely on long windows that exceed the pitch period of human
speech. In particular, male speakers’ low pitch frequencies are
difficult to resolve.
In this publication, we
introduce a technique that approaches pitch estimation from a different
perspective. The pitch information is extracted based on phase differences
between multiple low-resolution spectra instead of a single long window.
The technique benefits from the high temporal resolution provided by
the short frame shift and is capable to deal with the low spectral
resolution caused by short window lengths. Using the new approach,
even very low pitch frequencies can be estimated very efficiently.
Cite as: Graf, S., Herbig, T., Buck, M., Schmidt, G. (2017) Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra. Proc. Interspeech 2017, 2316-2320, doi: 10.21437/Interspeech.2017-1254
@inproceedings{graf17_interspeech, author={Simon Graf and Tobias Herbig and Markus Buck and Gerhard Schmidt}, title={{Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2316--2320}, doi={10.21437/Interspeech.2017-1254} }