This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically-related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are analyzed to extract pitch candidates. These candidates are then combined across multiple regions for obtaining separate pitch estimates of each speech-signal component at a single point in time. We refer to this as multi-region analysis (MRA). By explicitly accounting for pitch dynamics within localized time segments, this separability is distinct from that which can be obtained using short-time autocorrelation methods typically employed in state-ofthe- art multi-pitch tracking algorithms. We illustrate the feasibility of MRA for multi-pitch estimation on mixtures of synthetic and real speech.
Cite as: Wang, T.T., Quatieri, T.F. (2009) 2-d processing of speech for multi-pitch analysis. Proc. Interspeech 2009, 2827-2830, doi: 10.21437/Interspeech.2009-722
@inproceedings{wang09j_interspeech, author={Tianyu T. Wang and Thomas F. Quatieri}, title={{2-d processing of speech for multi-pitch analysis}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2827--2830}, doi={10.21437/Interspeech.2009-722} }