ISCA Archive SWAP 2000
ISCA Archive SWAP 2000

Why phonological constraints are so granular

Janet Pierrehumbert

The most common word length in the lexicon lies in the middle of the total range. The shortest words -- light V or CV monosyllables -- are necessarily few because a cross-product of the consonantal and vocalic phonemes generates only a small number of combinations. As length increases, the number of possible forms explodes, but fewer and fewer are actually used in any given language. Experimental and computational studies relate the sparsity of long forms to the fact that the likelihood of forms as determined by a stochastic parse decreases with length. This effect occurs because long forms have more subparts than short ones, and the likelihood of any given subpart is always less than 1.0. (Coleman and Pierrehumbert 1997, Frisch et al in press). The disadvantage that long forms have in achieving a high well-formedness score is a distinct phenomenon from the tendency of individual long words to have low token frequencies (though there may of course be some deep relationship between these characteristcs). In English, a morphologically impoverished language, the most common type of word is the disyllable.

This paper undertakes to relate the distribution of word lengths in the lexicon to the surprising simplicity of phonological constraints. A considerable body of results shows that people have implicit knowledge of phonological constraints which is projected from the lexicon. Experiments have revealed implicit knowledge of syllabic onsets and rhymes, syllable junctures, OCP-Place, vowel harmony, and foot structure. This knowledge originates in the lexicon in the sense that it shows systematic and gradient dependencies on the lexical statistics of particular languages. It is "projected from" the lexicon in the sense that it appears to be coarse-grained in comparison to the set of all possible epiphenomenal regularities which could in principle arise in the lexicon. Phonological constraints are formally shorter than the largest phonological objects which people can encode and remember. Phonological constraints are also cruder than would be predicted by purely phonetic factors, as discussed in recent work by Hayes and Hyman. And long-distance constraints -- such as vowel harmony -- tend to be cruder than local constraints.

Why don't people form long constraints on the basis of the many long words that they know? Why don't they have far more numerous and fine-grained constraints, reflecting the detailed phonetic knowledge that informs individual acts of speech production and perception? In this paper, we undertake to evaluate the contribution of three interacting assumptions to the answer to these questions.

Phonological generalizations are statistically trained over the lexicon.

Phonological generalizations need to be shared by a speech community, because they influence productive behaviour crucial to communicative success, ranging from signal parsing during speech perception to assimilation of neologisms and loan words.

Different members of the speech community have different lexicons; the lexicon of an individual can be roughly viewed as a random frequency- weighted downsampling of a large community dictionary such as CELEX.

In combination, these assumptions imply that the viable phonological generalizations are those which are statistically stable under downsampling of the dictionary. That is, independent downsamples -- reflecting different personal experiences of lexical acquisition -- should lead to essentially comparable estimates of the statistical strength of the constraint. Calculations of the rate of degradation of various possible constraints for English under slight to severe downsampling of CELEX will be presented in connection with this argument. Coarse generalizations over medium length words emerge as particularly viable due to the large number and high frequency of the medium length words.

Cite as: Pierrehumbert, J. (2000) Why phonological constraints are so granular. Proc. Spoken Word Access Processes (SWAP), 123-126

  author={Janet Pierrehumbert},
  title={{Why phonological constraints are so granular}},
  booktitle={Proc. Spoken Word Access Processes (SWAP)},