11th Century Kannada Literature Now on Wikisource

Posted by Subhashish Panigrahi at Mar 30, 2014 05:00 PM |
This blog post by Pavithra Hanchagaiah and Omshivaprakash HI was edited by Rohini Lakshane for DNA. It was first edited by Subhashish Panigrahi, CIS-A2K on the Wikimedia Foundation blog where it was first published.

Click to read the post republished on DNA on March 30, 2014. The original posted on Wikimedia Blog can be seen here.

In Kannada poetry, Vachana Sahitya is a form of rhythmic writing that evolved in the eleventh century and flourished in the twelevth, as a part of the “Lingayatha” movement. More than 259 Vachanakaras (Vachana writers) have compiled over 11,000 vachanas. 21,000 of these verses, which were published in the 15-volume “Samagra Vachana Samputa” by the government of Karnataka have been digitised. Two Wikimedians, along with a Kannada linguist and author O. L. Nagabhushana Swamy, are involved in the Unicode conversions, corrections and writing preface for these verses. The entire work is now available as a standalone project called “Vachana Sanchaya” and ready to enrich Kannada Wikisource.

This project was started a year ago when Kannada Wikimedian Omshivaprakash was trying to help Professor O.L. Naghabhushana Swamy and Kannada author and publisher Vasudhendra access the vachana (verses) of Vachana Sanchaya. Swamy had trouble using publicly available content on Vachanas since the data was in ASCII standard and searching the text was a huge problem. I (Pavithra Hanchagaiah) started to help gather information about vachanas and document it in Unicode by writing scripts for open source software. Further discussions were made to get thousands of vachanas in the form of a database, so that they could be made easily searchable with an index. This demanded that we build a platform supporting all these activities, which would help the linguistic researchers, students and members of the general public who have an interest in reading and studying Vachana literature. With this idea, Omshivaprakash started designing the model, and his colleague Devaraju started building it. In the meantime I was running various scripts to fix errors in conversion of ASCII text to Unicode, confirming that the data was ready to consume by the modules developed for concordance. We spent weekends and holidays executing this project from home. With the constant feedback and guidance from Mr Swamy and Vasudendra, we learnt how concordance of text is used by researchers and what would make it easier for them to research on Vachana Sahitya. Omshivaprakash worked on the architecture of the platform, decided the infrastructure requirements – free and open source software technologies were used to keep the platform active while managing the entire project. I provided critical hacks for digitization and gave feedback through suggestions.

Working System

Currently, the system has around 200,000 unique words in its repository. Vachana Sanchaya is meant for research rather than just a repository of text on the web. While you search the words on our system, you can see who has used the word in all Vachanas. To make the research more readable, we highlight the text searched in each Vachana that would be displayed. To repeat the search for a specific Vachanakara (poet) you just need to click on his name on the graph on the results page. We have used MediaWiki’s jquery-ime input tool architecture that helped us provide a feature to directly enter Kannada text in Unicode for searches. So just type, and get results!

Public Response

We are glad to see people accessing vachanas from our Facebook, Twitter and Google+ channels. There have been approximately 500,000 pageviews to our site in the first few months of our platform’s public launch. Interestingly, commonly searched Kannada words like “ಕರ್ಮ”(Karma en:Work/Deed) , “ಸತ್ಯ” (Sathya -en:Truthfulness ) and “ನದಿ” (River) have resulted in quick and easy results.

Plans for the Future

Our system is extensible with respect to adding new feature – we have a review desk for researchers to help us with the review of content. Later we will also be adding required references to Vachanas from various research works that have been done around this literature. The content is available to the public through OpenData API and will be distributed as public domain through Wikisource once the review work is complete. This will open up the system for students, developers, researchers and anyone interested in working around building linguistic tools for Kannada and other Indic languages. This system is meant to evolves around other works rather than having to change and re-invent the wheel for more such projects. Vachana Sahitya will further help us to initiate Natural Language Processing (NLP) projects if more researchers get together to tag the words, glossary etc in the coming days. We can also fulfill the need of various language tools like spelling and grammar checker for users through crowd-sourcing the development. The next projects under the “Kannada Sanchaya” are Sarvagnana Vachanagalu and Dāsa Sanchaya which are in the pipeline with initial phases of work underway. Our idea is to extend this platform from Vyasa to Muddanna and possibly the contemporary literature work available in the public domain.