INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components

Yasunori Ohishi (1), Hirokazu Kameoka (1), Daichi Mochihashi (2), Kunio Kashino (1)

(1) NTT Communication Science Laboratories, NTT Corporation, Japan
(2) The Institute of Statistical Mathematics, Japan

We present a novel stochastic model of singing voice fundamental frequency (F0) contours for characterizing expressive dynamic components, such as vibrato and portamento. Although dynamic components can be important features for any singing voice applications, modeling and extracting these components from a raw F0 contour have yet to be accomplished. Therefore, we describe a process for generating dynamic components explicitly and represent the process as a stochastic model. Then we develop an algorithm for estimating the model parameters based on statistical techniques. Experimental results show that our method successfully extracts the expressive components from raw F0 contours.

Index Terms: Singing voice, Fundamental frequency, Second-order linear system, Stochastic model

Full Paper

Bibliographic reference.  Ohishi, Yasunori / Kameoka, Hirokazu / Mochihashi, Daichi / Kashino, Kunio (2012): "A stochastic model of singing voice F0 contours for characterizing expressive dynamic components", In INTERSPEECH-2012, 474-477.