Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Correlation Analysis Between Speech Power and Pitch Frequency for Twenty Spoken Languages

Kenzo Itoh

NTT Human Interface Laboratories, Kanagawa, Japan

This paper describes the relationship between speech signal power and pitch frequency for twenty main languages. The goal is to confirm of the applicability of our earlier proposed power control rule using the relationship. First, an overall analysis is conducted for each language. Second, the short term correlation is analyzed to study the relationship between high correlation values and other characteristics of the speech signal. Last, in order to get information for developing an English Text-To-Speech (TTS) system, averaged phoneme power and pitch frequency is analyzed using American English speech signals with phoneme labeled data. Main results are shown below. (l)The average correlation coefficient ranged from +0.72 to +0.44 for the twenty languages. (2)The short term analysis found that high level speech was accompanied by high correlation values. The reason for this assumed to be the relationship between accentuation or stress. (3)The prospect for phoneme power control in an American English TTS system is excellent given the strong relationship between power and pitch, Those results strongly suggest that the relationship between speech power and pitch frequency can be used in speech processing systems for all spoken languages.

Full Paper

Bibliographic reference.  Itoh, Kenzo (1994): "Correlation analysis between speech power and pitch frequency for twenty spoken languages", In ICSLP-1994, 331-334.