EUROSPEECH 2003 - INTERSPEECH 2003
Phonemic restoration refers to the synthesis of masked phonemes in speech when sufficient lexical context is present. Current models for phonemic restoration however, make no use of lexical knowledge. Such models are inherently inadequate for restoring unvoiced phonemes and may be limited in their ability to restore voiced phonemes too. We present a predominantly top-down model for phonemic restoration. The model uses a missing data speech recognition system to recognize speech utterances as words and activates word templates corresponding to the words containing the masked phonemes. An activated template is dynamically time warped to the noisy word and is then used to restore the speech frames corresponding to the masked phoneme, thereby synthesizing it. The model is able to restore both voiced and unvoiced phonemes. Systematic testing shows that this model performs significantly better than a Kalman-filter based model.
Bibliographic reference. Srinivasan, Soundararajan / Wang, DeLiang (2003): "Schema-based modeling of phonemic restoration", In EUROSPEECH-2003, 2053-2056.