Phoneme-like units and speech perception

In English, the perception of syllables and words can be largely predicted from the perception of 'smaller' phoneme-like units. Experiments reviewed by Allen (1994) show that the correct identification of nonsense CVC syllables in noise can be extremely well predicted from the marginal correct identification rates of their constituent phonemes. Simulations by Nearey (to appear, submitted) suggest that this result can be readily achieved when syllable patterns can be 'factored' into phoneme parts, while representations allowing even slightly idiosyncratic relationships between stimuli and syllables cannot reproduce such results. Parametric experiments with synthetic speech, wherein listeners hear syllables that span two or more categories of two or more segments (e.g. bad, bed, bat, bet ) also provide evidence for phoneme-factorability (Nearey 1997, submitted.) This paper summarizes how models with phoneme-like units as the core phonetic-transponders can be supplemented with bias elements that accommodate such phenomena as transitional probabilities and lexical frequency to provide a good account for many phenomena in the perception of words and nonsense syllables alike. A progress report the construction a prototype automatic speech recognition system for a subset of English CVC syllables that conforms to this structure will also be presented.

Allen, J. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing, 2, 567-577.

Nearey, T. (1997). Speech perception as pattern recognition. J. Acoust. Soc. Amer. 101, 3241-3254.

Nearey, T. (to appear). The factorability of phonological units in speech perception: Simulating results on speech reception in noise., in Festschrift for Bruce L. Derwing, edited by R. Smyth.

Nearey, T. (submitted). On the factorability of phonological units in speech perception. Submitted to Labphon VI.

