Speaker-invariant structural representation of speech was proposed , where only the phonic contrasts between speech sounds were extracted to form their external structure. The acoustic substances were completely discarded. Considering a mapping function between speaker A's acoustic space and B's space, the speech dynamics was mathematically proven to be invariant between the two irrespective of the form of the function . This structural and dynamic representation was applied to describe the pronunciation of learners . Since the non-linguistic factors were removed effectively, the representation could highlighted the non-nativeness in the individual pronunciations. For vowel learning, it was automatically estimated for each of the learners which vowels to correct by priority . Unlike the conventional approach, the estimation was done without the direct use of sound substances such as spectrums. In this paper, using the vowel charts of the learners plotted by an expert phonetician, the validity of this contrastive or relative approach is examined by comparing it with the conventional absolute approach. Results show the high validity of the proposed method.
Bibliographic reference. Minematsu, Nobuaki / Kamata, K. / Asakawa, Satoshi / Makino, T. / Nishimura, T. / Hirose, Keikichi (2007): "Structural assessment of language learners' pronunciation", In INTERSPEECH-2007, 210-213.