1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

Hierarchical language models based on classes of phrases: formulation, learning and decoding. (Original: “Modelos de lenguaje jerárquicos basados en clases de phrases: formulación, aprendizaje y decodificación.”) [PhD Thesis]

Raquel Justo

Department of Electricity and Electronics. University of the Basque Country. Spain

This thesis focuses on the area of stochastic language modeling. A stochastic language model captures the way in which the combination of words is carried out in a specific language. It does so by making use of probability distributions of linguistic events, such as the frequency of appearance of words in sentences. Large amounts of training data, not always available, are required to get a robust estimation of the parameters defining such models.
   In this work, a two-level hierarchical language model, based on classes of phrases, is proposed to deal with data sparseness. Each level in the model is associated to a different knowledge source. In the upper level the relations among classes are taken into account, i.e. relations among abstract entities employed to generalize. In the second level the relations among words are considered. The cooperation between different levels allows to build an improved language model. Within this framework different approaches and ways of combining models are defined and formulated.
   Throughout this work language modeling has been explored in the framework of Automatic Speech Recognition (ASR). Thus, a methodology to integrate the proposed models into the decoding stage of the ASR system has been developed. In order to validate the presented approaches an experimental stage has been carried out using different databases. Three different languages and tasks of different complexity, spontaneous speech and read speech,... have been employed.
   On the other hand, the use of the proposed hierarchical language models within a dialogue system prototype has been explored. In this case the main goal is to maximize the performance of the system in real working conditions.
   Finally, a translation model based on the same hierarchical nature has been defined and formulated. This model has been integrated into a speech translation system. The methodology employed to integrate the language model in the ASR system can be directly applied to this case.

Full Paper

Bibliographic reference.  Justo, Raquel (2009): "Hierarchical language models based on classes of phrases: formulation, learning and decoding. (original: “modelos de lenguaje jerrquicos basados en clases de phrases: formulacin, aprendizaje y decodificacin.´) [phd thesis]", In SLTECH-2009, 111-112.