This paper describes an acoustic-to-phonetic decoder (APD) (based on a mixed strategy: a) bottom-up which hypothesizes the most robust information about the speech signal, b) top-down which makes some verifications about the acoustic features or about the macro-class localization on the speech signal. In this paper, only the bottom-up strategy is described. In our system, a phoneme is described as a phonetic network whose nodes are mapped onto the acoustic signal. The coarse phonetic description then uses five phonetic networks whose nodes correspond to the acoustic phases of the analyzed sound in the speech signal. These phases are extracted by automatic segmentation using different parameters (energy, pitch, formant frequencies, acoustic cues from an ear model). The bottom-up APD is divided into three steps: a) the first step localizes pseudo-phonetic segments (called acoustic phases) on the signal and defines phoneme boundaries according to a macro-class description (stop consonants, fricatives, other consonants, vowels and pauses); b) context-sensitive rules are then applied in order to filter out the most improbable solutions; c) the third step labels the most significant phase of each phoneme by acoustic features (using Bayesian methods). In these paper, the performance is measured by the comparison between labels generated automatically and labels generated normally: for example, detection of plosive burst rates 97% while detection of occlusive phonetic network rates 94.3%. This strategy is written in Prolog II.
Bibliographic reference. Tattegrain, Helene / Caelen, Jean (1989): "Phonetic unit localization in a multi-expert recognition system", In EUROSPEECH-1989, 2256-2259.