This paper is concerned with three types of causes leading to errors in a system using strictly speaker-independent rules for automatic extraction of linguistic information from measured prosodic parameters (PP) in read isolated sentences, in French: erroneous measurements of PP, duration and fun- damental frequency (type-1 errors); differences between speakers who do not fit into the same prosodic moult (type-2 problems) and certain combination of segmental influences on duration, which cannot be factored out in a strictly bottom-up system (type-3 problems). It suggests that neither further tuning of the existing rules, nor statistical learning are complete solutions. Type-1 errors are extrinsic to the prosodic module and can be hardly improved. An effective way of reducing incertainties due to type-2 problems is a partial tuning of the set of rules to the particular habits of the speaker: adaptation is feasible because there is a remarkable intra-speaker consistency in prosodic patterning, at least in serially read isolated sentences. Type-3 errors leads to multiple solutions in certain cases. It is necessary therefore to model to a certain extent the inter-speaker variability.
Cite as: Vaissière, J. (1989) On automatic extraction of prosodic information for automatic speech recognition system. Proc. First European Conference on Speech Communication and Technology (Eurospeech 1989), 1202-1205, doi: 10.21437/Eurospeech.1989-19
@inproceedings{vaissiere89_eurospeech, author={Jacqueline Vaissière}, title={{On automatic extraction of prosodic information for automatic speech recognition system}}, year=1989, booktitle={Proc. First European Conference on Speech Communication and Technology (Eurospeech 1989)}, pages={1202--1205}, doi={10.21437/Eurospeech.1989-19} }