INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Corpus-Testing a Fricative Discriminator; or, Just How Invariant is this Invariant?

Philip J. Roberts (1), Henning Reetz (2), Aditi Lahiri (1)

(1) University of Oxford, UK
(2) Goethe-Universität Frankfurt am Main, Germany

Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. In [1], Evers et al. present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by [2], using VCV tokens recorded in the lab.
   We tested the algorithm from [1] against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in [1] and [2], with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a similarly clear distinction between [s] and [ʃ] to those found in previous studies, the absolute value of the threshold between the two sounds is sensitive to the dynamic range of the input signal.Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. In [1], Evers et al. present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by [2], using VCV tokens recorded in the lab.
   We tested the algorithm from [1] against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in [1] and [2], with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a similarly clear distinction between [s] and [ʃ] to those found in previous studies, the absolute value of the threshold between the two sounds is sensitive to the dynamic range of the input signal.

References

  1. V. Evers, H. Reetz, and A. Lahiri, “Crosslinguistic acoustic categorization of sibilants independent of phonological status,” Journal of Phonetics, vol. 26, pp. 345—370, 1998.
  2. K. Maniwa, A. Jongman, and T. Wade, “Acoustic characteristics of clearly spoken English fricatives,” Journal of the Acoustical Society of America, vol. 125, pp. 3962–3973, 2009.

Full Paper

Bibliographic reference.  Roberts, Philip J. / Reetz, Henning / Lahiri, Aditi (2014): "Corpus-testing a fricative discriminator; or, just how invariant is this invariant?", In INTERSPEECH-2014, 189-192.