7th International Conference on Spoken Language Processing
September 16-20, 2002
Denver, Colorado, USA
Improving the Role of Unvoiced Speech Segments by Spectral Normalisation in Robust Speech Recognition
Carlos Lima (1), Luís B. Almeida (2), João L. Monteiro (1)
(1) University of Minho, Portugal; (2) Technical University
of Lisbon, Portugal
This paper presents a spectral normalisation based method for extraction
of speech robust features in additive noise. The method has
two main goals:
For Signal to Noise Ratio greater than 5 dB the results show that for
stationary white noise, the proposed normalisation process where
the noise characteristics are ignored, outperforms the conventional
Markov models composition where the noise must be known. Additionally,
if the noise is known, a reasonable approximation of the
inverted system can easily be obtained by performing noise compensation
and still increasing the recogniser performance.
- The "peaked" spectral zones, where the most speech energy is
concentrated must be preserved (from clean to noisy speech
features) as much as possible by the feature extraction process.
Usually, these spectral regions are the most reliable due
to the higher speech energy, and the frequently assumption
of independence between speech and noise.
- 2. The speech regions with less energy need more robustness,
since in these regions the noise is more dominant, thus the
speech is more corrupted. Usually these speech regions correspond
to unvoiced speech where are included nearly half
of the consonants. The proposed normalisation will be optimal
if the corrupted and the noise process have both white
noise characteristics. Optimal normalisation means that the
corrupting noise does not change at all the means of the observed
vectors of the corrupted process.
Lima, Carlos / Almeida, Luís B. / Monteiro, João L. (2002):
"Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition",
In ICSLP-2002, 1573-1576.