![]() |
Speech Recognition and Intrinsic Variation (SRIV2006)Toulouse, France |
![]() |
Speech communication has several steps of production, encoding, transmission, decoding, and hearing. In every step, acoustic distortions are involved inevitably as differences of vocal tract length, gender, age, microphone, room, line, hearing characteristics, etc. These are static non-linguistic factors and completely irrelevant to speech recognition. Although the spectrogram always carries these factors, almost all the speech applications have been built on this noisy representation. Recently, the first author proposed a novel representation of speech, called the acoustic universal structure. What is represented here is only the interrelations among speech events and their absolute properties are discarded completely. It is very interesting that the non-linguistic factors can be removed effectively from speech as cepstrum smoothing of the spectrogram can remove pitch information from speech. The first author already used this representation in some speech applications and, in this paper, its theoretical background is described in detail from the viewpoints of linguistics, psychology, acoustics, and mathematics with some results of recognition experiments and perceptual experiments. It is shown that the new representation can be viewed as speech Gestalt.
Bibliographic reference. Minematsu, Nobuaki / Nishimura, Tazuko / Nishinari, Katsuhiro / Sakuraba, Kyoko (2006): "Theorem of the invariant structure and its derivation of speech gestalt", In SRIV-2006, 47-52.