Third International Conference on Spoken Language Processing (ICSLP 94)
Two methods were proposed for the use of prosodic features in automatic speech recognition. One is to detect syntactic boundaries of input speech without information on the segmental level, which will be obtained by the ordinary speech recognition process. The other is to check the feasibility of recognition results. In the first method, both the microscopic and macroscopic features of fundamental frequency contours were taken into account, and 96 % of manually detectable boundaries were correctly extracted for the ATR continuous speech database. Several schemes were also proposed to reduce the insertion errors. As for the second method, a scheme of partial analysis-by-synthesis was developed, where fundamental frequency contours are generated using a functional model for the recognition hypotheses of the segmental level and are compared with the observed contour only for the part with recognition ambiguity. The hypothesis giving the best matching with the observation is the possible final recognition result. The proposed method was shown to be valid for recognition errors that include changes in the accent types and in the syntactic boundaries.
Bibliographic reference. Hirose, Keikichi / Sakurai, Atsuhiro / Konno, Hiroyuki (1994): "Use of prosodic features in the recognition of continuous speech", In ICSLP-1994, 1123-1126.