ISCA Workshop on Plasticity in Speech Perception (PSP2005)

Senate House, London, UK
June 15-17, 2005

Recognising Tones by Tracking Movements - How Infants May Develop Tonal Categories from Adult Speech Input

Bruno Gauthier (1), Rushen Shi (1), Yi Xu (2)

(1) University of Quebec in Montreal, Canada
(2) University College London, UK

Previous research has demonstrated that the perception of speech in infants moves gradually from being language-general to being language-specific during the first year of life. Recent research found that infants learning a tone language begin to show particular response patterns to tones in their native language by the age of six months (Mattock, 2004). The present study uses connectionist modelling to explore how infants might develop tones in Mandarin. F0 is generally considered the main cue for tone perception. However, F0 patterns in connected speech yield considerable betweencategory overlap and large within-category variability. Since speech input to infants consists mainly of multi-word utterances by multiple speakers, tone learning must involve processes that can effectively resolve both types of variability. In this study we explore the Target Approximation model (Xu & Wang, 2001), which characterises surface F0 as asymptotic movements toward underlying pitch targets defined as simple linear functions. The model predicts that it is possible to infer underlying pitch targets from the manners of F0 movements, for they may more directly reflect the characteristics of intended goals. Using the production data of multiple speakers in connected speech from Xu (1997), we trained a self-organising neural network with both F0 profiles and F0 velocity profiles as input, with no initial stipulation about the number of tonal categories to be discovered. F0 velocity profiles (i.e., first derivatives of F0s, hereafter referred to as D1) represent the vocal fold changing fundamental frequency. The testing phase showed that D1 yielded almost perfect categorisation of the four tones, far superior than F0. Visualisation techniques showed that D1 distribution in the network formed distinct regions of clustering neighbourhoods representing each tone, an organisation pattern resembling frequency topographic maps observed in the primary auditory cortex. The results indicate that D1 can effectively abstract away from surface variability and directly reflect underlying articulatory goals. The finding thus points to one way through which infants can successfully derive at phonetic categories from adult speech, namely, by extracting underlying phonetic targets based on information directly reflecting production. The implications of our finding for understanding the link between speech articulation and motor movements in general will be discussed. Research supported by FCAR Scholarship, NSERC and FQRSC.

