ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

A common phone model representation for speech recognition and synthesis

Mats Blomberg

A combined representation of context-dependent phones at the production parametric and the spectral level is described. The phones are trained in the production domain using analysis-by-synthesis and piece-wise linear approximation of parameter trajectories. For recognition, this representation is transformed to spectral subphones, using a cascade formant synthesis procedure. In a connected-digit recognition task, 99.1% average correct digit rate was achieved in a group of seven male speakers when, for each test speaker, training was done on the other six speakers. Simple rules for male-to-female transformation of the male phone library increased the performance for six female speakers from 88.9% without transformation to 96.3%. In informal listening tests of resynthesised digit strings, the speech has been judged as intelligible, however far from natural.


Cite as: Blomberg, M. (1994) A common phone model representation for speech recognition and synthesis. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 1875-1878

@inproceedings{blomberg94_icslp,
  author={Mats Blomberg},
  title={{A common phone model representation for speech recognition and synthesis}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={1875--1878}
}