A novel speech waveform generation method using a non-maximally decimated filter bank is proposed, where spectral features of synthetic sounds are created by amplitude modification of subband samples that are pre-decomposed from impulse or noise waveforms. The proposed method uses two synthesis banks of the maximally decimated pseudo quadrature mirror filter (QMF) bank structure which is similar to that in the MPEG audio decoder. Consequently, the computational complexity of the proposed method is O(log N) per sample, while that of the conventional method based on the source-filter model with an auto-regressive (AR) filter or mel log spectrum approximation (MLSA) filter is O(N) per sample. A MOS test for resynthesized speech sounds from the results of analyzing natural speech sounds showed the proposed method achieved scores similar to those of the conventional method using the MLSA filter for a female narrator.
Index Terms: HMM-based speech synthesis, pseudo quadrature mirror filter bank, non-maximal decimation, embedded systems
Bibliographic reference. Nishizawa, Nobuyuki / Kato, Tsuneo (2012): "Speech synthesis using a non-maximally decimated filter bank for embedded systems", In INTERSPEECH-2012, 1432-1435.