15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Chaotic Mixed Excitation Source for Speech Synthesis

Hemant A. Patil, Tanvina B. Patel

DA-IICT, India

Linear Prediction (LP) analysis has proven to be very powerful and widely used method in speech analysis and synthesis. Synthesis by LP-based approach is carried by exciting an all-pole model (whose parameters are derived by LP analysis). Synthesis is carried by using mixed excitation source consisting of a sequence of impulses for voiced regions and white-noise source for unvoiced regions. In this paper, we present novel chaotic excitation source using chaotic titration method. The voiced and unvoiced regions in speech are modeled by chaos which is quantified by adding noise of known standard deviation (determined using chaotic titration method). It is observed that on an average for synthesized voices (both male and female), MOS increases from 2 to 2.4, DMOS from 2.1 to 2.4 and preference is increased from 39% to 61% via A/B test. PESQ score increases from 1 to 1.5 and MCD score decreases from 4.06 to 4.03, relatively for voices synthesized by proposed chaotic mixed excitation source. The relatively better performance of proposed approach is may be due to the novel chaotic mixed source of excitation.

