The automatic determination of pitch is one of the most difficult tasks in speech processing. Many pitch determination algorithms have been proposed, and very few of them seem to work properly in a noisy environment. Dik J. Hermes reviews the most recently published works on pitch determination and mention that very few have been well evaluated on speech data.
Among the new pitch determination algorithms one can refer to the work by Medan et al., in which properties such as pitch estimation with good resolution and robustness are shown. These algorithms seem to work well for normal speech signals under certain noise conditions and tend not to serve for the speech that has been transmitted through various telephone systems in a very noisy environment. As a matter of fact, the telephone system acts like a bandpass filter which can attenuate the fundamental and some low-frequency pitch (Fo < 120 Hz). Furthermore, speech in a very noisy environment can be degraded in such a way that the low-frequency components become entirely unreliable. Therefore, the proposed Pitch Determination Algorithm will have to rely on other components in order to be robust to noise before it can be integrated in a speech processing system. Another difficulty lies in estimating the pitch determination algorithm performance. Usually a reference pitch is needed to evaluate the performance by comparing the PDA with it. We will compare the PDA with a reference algorithm and with hand-labelled pitch.
An important problem is related to the estimation of the Signal to Noise Ratio for noisy speech. As the paper presents experiments on noisy telephone speech, one has to take into consideration the signal to noise ratio. Most of the reported works on PDA evaluate the performance based on the averaged signal to noise ratio for a sentence or for a database. Again, it is very difficult to compare the results from two different PDAs if the speech database is not the same, even if the averaged signal to noise ratio is similar for the two databases. To alleviate this particular difficulty, a signal to noise ratio is estimated and associated to each speech frame and the pitch frequency value is obtained according to the SNR. This is done with the proposed PDA and a reference PDA for the same database.
Cite as: Rouat, J., Liu, Y.C. (1992) A pitch determination algorithm for very noisy telephone speech. Proc. ETRW on Speech Processing in Adverse Conditions, 163-166
@inproceedings{rouat92_spac, author={J. Rouat and Y. C. Liu}, title={{A pitch determination algorithm for very noisy telephone speech}}, year=1992, booktitle={Proc. ETRW on Speech Processing in Adverse Conditions}, pages={163--166} }