This paper presents a method to model the influence of glottal excitation on STRAIGHT spectrum by fitting the spectral envelop with mixture of Gaussians (MOG). The first Gaussian component is used as estimation to glottal formant in STRAIGHT spectrum because analysis result shows that it has an obviously stronger correlation with fundamental frequency than other spectral components and has similar characteristics with glottal formant. Then linear regression is carried out to measure the relationship between F0 and the parameters of the first Gaussian component. This model is applied to STRAIGHT synthesis process and proved to be effective in compensating the voice quality variation caused by pitch modification. 1. INTRODUCTION STRAIGHT (Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum) [1], as a high-quality VOCODER-type analysis-synthesis method, has been presented in recent years. This analysis process is based on fundamental frequency extraction using TEMPO (Time-Domain Excitation extractor using Minimum Perturbation Operator)[2] and pitch adaptive spectral smoothing in both time and frequency domains. The synthesis process is implemented by passing a serial of impulse excitations with pitch period intervals through a time varying filter which is calculated from the smoothed spectral envelop. By manipulating the pulse positions in excitation, flexible prosody modification is realized [1]. In the framework of STRAIGHT analysis and synthesis, the excitation consists of only pitch information. Therefore the smoothed spectral envelop is the integration of both spectral presentation of glottal waveform and vocal tract transfer function according to general speech production hypothesis. Here a method for measuring the effect of glottal excitation on STRAIGHT spectral envelop is presented for the following two purposes, while the work introduced in this paper focuses on the first one: 1. It has been proved that some spectral characteristics of glottal waveform are dependent on not only the parameters that define phonation type or voice quality, such as OQ, SQ, but also the fundamental frequency of glottal source [3]. By modeling the effect of F0 on STRAIGHT spectrum, the pitch modification during STRAIGHT synthesis can be improved. 2. By decomposing STRAIGHT spectrum into sourcedependent components and source-independent components, we can provide an alternative for voice quality modification under STRAIGHT framework. The mixture of Gaussians (MOG) model [4][5] is a speech spectral modeling method. Compared with linear predictive or cepstral coefficients, its parameters have more obvious physics meanings in fitting spectral peaks and are more independent from each other. So it is introduced here to estimate the glottal formant in STRAIGHT spectrum. In the following part of this paper, an introduction to the method is presented in section 2. Section 3 gives experiment results and related analysis. Section 4 and 5 are discussions and conclusions. 2. METHOD 2.1. The spectral representation of glottal waveform The spectral characteristics of glottal waveform are studied based on LF model [6], which describes the shape of the differentiated glottal airflow using the following five parameters: T0, EE, RA, RG, RK. The open quotient of glottal source is related to both RG and RK: OQ = (1+RK)/(2RG). As mentioned in [3], the spectrum of LF model has two main characteristics:
Cite as: Ling, Z., Wang, Y., Hu, Y., Wang, R. (2004) Modeling Glottal Effect on the Spectral Envelop of STRAIGHT using Mixture of Gaussians. Proc. International Symposium on Chinese Spoken Language Processing, 73-76
@inproceedings{ling04_iscslp, author={Zhenhua Ling and Yuping Wang and Yu Hu and RenHua Wang}, title={{Modeling Glottal Effect on the Spectral Envelop of STRAIGHT using Mixture of Gaussians}}, year=2004, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={73--76} }