2nd Workshop on Spoken Language Technologies for Under-Resourced Languages

Universiti Sains, Penang, Malaysia
May 3-5, 2010

Analysis Of Noori Nasta'leeq for Major Pakistani Languages

M. G. Abbas Malik (1), Christian Boitet (1), Pushpak Bhattachariyya (2)

(1) GETALP-LIG, Université de Grenoble (Ex. Université Joseph Fourier), France
(2) CSE, IIT Bombay, India

Nasta'leeq is a bidirectional, diagonal, non-monotonic, cursive, highly context-sensitive and very complex writing style for languages like Urdu, Punjabi, Balochi and Kashmiri. Each is written in a variant of the Perso-Arabic script. The style is characterized by well-formed orthographic rules that are passed down from generation to generation of calligraphers and old manuscripts. It is present in calligraphic arts and printed materials of the present, but orthographic rules have not been quantitatively analyzed in detail for the above-mentioned languages. This paper first presents the salient features of the Perso-Arabic script and briefly introduces its different writing styles. It also briefly discusses alphabets of major Pakistani languages. Finally, it gives the quantitative analysis of Nasta'leeq and explains its context-sensitive behavior with respect to Pakistani languages, knowing that it is equally true for Arabic, Persian and other languages written in derivations of the Perso- Arabic script. Finally, it discusses the Context-Sensitive Substitution Grammar of Nasta'leeq, a computational model of Nasta'leeq.

Index Terms: Nasta'leeq, script, Arabic, Persian, Urdu, Punjabi, Sindhi, Balochi, Kashmiri

Full Paper

Bibliographic reference.  Malik, M. G. Abbas / Boitet, Christian / Bhattachariyya, Pushpak (2010): "Analysis of Noori Nasta'leeq for major Pakistani languages", In SLTU-2010, 95-103.