ISCA Archive SpeechProsody 2010
ISCA Archive SpeechProsody 2010

A modulation-demodulation model of speech communication

Nobuaki Minematsu

Perceptual invariance against a large amount of acoustic variability in speech has been a long-discussed question in speech science and engineering [1] and it is still an open question [2, 3]. Recently, we proposed a candidate answer to it based on mathematically-guaranteed relational invariance [4, 5]. Here, completely transform-invariant features, f-divergences, are extracted from speech dynamics of an utterance and they are used to represent that utterance. In this paper, this representation is interpreted from a viewpoint of telecommunications and evolutionary anthropology. Speech production is often regarded as a process of modulating the baseline timbre of a speaker's voices by manipulating the vocal organs, i.e., spectrum modulation. Then, extraction of the linguistic content from an utterance can be viewed as a process of spectrum demodulation. This modulation-demodulation model of speech communication has a good link to known morphological and cognitive differences between humans and apes. The model also claims that a linguistic content is transmitted mainly by supra-segmental features.

Index Terms: speech recognition, invariant features, spectrum demodulation, evolutionary anthropology, language acquisition

s J. S. Perkell and D. H. Klatt, Invariance and variability in speech processes, Lawrence Erlbaum Associates, Inc., 1986. R. Newman, “The level of detail in infants' lexical representations and its implications for computational models,” Keynote speech in Workshop on Acquisition of Communication and Recognition Skills (ACORNS), 2009. S. Furui, “Generalization problem in ASR acoustic model training and adaptation,” Keynote speech in IEEEWorkshop on Automatic Speech Recognition and Understanding (ASRU), 2009. N. Minematsu, “Mathematical evidence of the acoustic universal structure in speech,” Proc. ICASSP, 889–892, 2005. Y. Qiao et al.,“A study on invariance of f-divergence and its application to speech recognition,” IEEE Transactions on Signal Processing, 58, 2010.

Cite as: Minematsu, N. (2010) A modulation-demodulation model of speech communication. Proc. Speech Prosody 2010, paper 913

  author={Nobuaki Minematsu},
  title={{A modulation-demodulation model of speech communication}},
  booktitle={Proc. Speech Prosody 2010},
  pages={paper 913}