7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Adaptive Estimation of Time-Varying Features from High-Pitched Speech Based on an Excitation Source HMM

Akira Sasou, Kazuyo Tanaka

National Institute of Advanced Industrial Science and Technology, Japan

This paper describes a method of extracting time-varying features that is effective for speech signals with high fundamental frequencies. The proposed method adopts a speech production model that consists of a Time-Varying Auto-Regressive (TVAR) process for an articulatory filter and a Hidden Markov Model (HMM) for an excitation source. The model represents waveform amplitude variations by time-varying gain of the excitation source. The proposed algorithm is given by extending a Viterbi algorithm so that the proposed algorithm can adaptively estimate TVAR coefficients and time-varying gain with decoding the state transition of the excitation source HMM. We applied the proposed method to extracting time-varying features from both synthetic and natural speech, and confirmed its feasibility.

Full Paper

Bibliographic reference.  Sasou, Akira / Tanaka, Kazuyo (2002): "Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM", In ICSLP-2002, 2161-2164.