ISCA Archive WSLP 2003
ISCA Archive WSLP 2003

Vishleshika: Statistical text analyzer for Hindi and other Indian languages

Sunita Arora, Karunesh Kr. Arora, S. S. Agarwal

The vast majority of knowledge and information is available in Natural Language and stored in the form of text in books, articles, reports etc. This Knowledge source needs to be converted into digital knowledge base for making it easily accessible through computers and networks and for using in development of Human Machine Communication Systems. Statistical Analysis of text can provide information about phonetic and linguistic description and structure of a given language which can be used for developing Knowledge based Language/Speech Systems for communication.

This paper describes the development of a software tool named Vishleshika for conducting detailed Statistical Analysis of Hindi language and adaptable to other Indian languages. Several types of Statistical Analysis from simple frequency countsand linguistic features to syntactic and semantic analysis could be done with the help of this package. The objective is to shift the burden of many linguistic decisions to the Statistical Analysis.

The input text may be a single ISCII file or a set of several files. Results can be copied, printed or saved to a file. It is specially designed for use by Linguists, Compu-Linguists, Knowledge Engineers, Lexicographers, Speech Database Creators, Spoken Language System Developers, Language Teachers and Students.

The result obtained by analyzing a sample corpus of Hindi Text in terms of phonetic and linguistic observation are presented and discussed.

Cite as: Arora, S., Arora, K.K., Agarwal, S.S. (2003) Vishleshika: Statistical text analyzer for Hindi and other Indian languages. Proc. Workshop on Spoken Language Processing, 191-198

  author={Sunita Arora and Karunesh Kr. Arora and S. S. Agarwal},
  title={{Vishleshika: Statistical text analyzer for Hindi and other Indian languages}},
  booktitle={Proc. Workshop on Spoken Language Processing},