Technology

Transliteration of Kannada to English (Latin) ISO 15919 in Python

· 2 min read >

Are you a creature of illusion? Or is illusion your creation?
Are you a part of the body? Or is the body a part of you?
Is space within the house? Or is the house within space?
Or are both space and the house within the seeing eye?
Is the eye within the mind? Or is the mind within the eye?
Or are both the eye and the mind within you?

Saint-Poet Kanakadasa

What a deep rumination about the human mind and perception by Saint-Poet Kanakadasa (ಕನಕದಾಸ) of the 16th Century!

If you enjoyed reading this verse, trust me, it would sound even better if you read it in Kannada (ಕನ್ನಡ), the language in which it was originally composed. After all, translations can only help you get one of the multiple possible interpretations of what the original would have meant, after compromising with the loss of rhyme, rhythm, figurative, shape and mood.

Kannada is rich in literature and culture and, it is a native language of as many as 44 million people in southern India. But, that’s only 0.5% of all the people in the world today. So, for the rest 99.5%, the best and easiest way to read in Kannada is Romanization. Thanks to the British colonization (sic)!

Romanization is simply the conversion of a different writing system to the Roman (Latin) script. In our case, from Kannada to Roman. Over the period, multiple systems and standards like Hunterian transliteration, ITRANS, IAST, ISO 15919 (2001), etc have evolved to romanize Indic scripts.

For one of my recent website projects for an academic institution, we had to transliterate a lot of text. And while doing this, we faced 2 major problems:

  1. Which standard of transliteration to use?
  2. How to transliterate so much of text?

While Hunterian is the most commonly adopted system, it has its own drawbacks like the lack of differentiation between retroflex and dental consonants (e.g. ಡ and ದ are both represented by d). Modifying it with capital letters would make it hard to read. So after long deliberations, we decided to go with ISO 15919 (2001), a standard transliteration convention that uses diacritics to map the much larger set of Indic consonants and vowels to the Latin script. That solved the first problem.

Now for the second, it would be really tedious to type pages of text. Whenever I find something monotonous that follows a pattern, I find ways to automate it. Without wasting any more time, I researched online and found some truly awesome tools and scripts that would transliterate. While some had the limitation of text length, others were not so accurate. Some were so heavy and complex that I felt like I asked for a banana but what I got was a gorilla holding the banana and the entire jungle. No doubt, it was developed for a much larger purpose but, I needed something simple and lightweight that would just do the job. So I ended up creating one on my own: om_transliterator.

Sample input and output of om_transliterator

May be, if I had searched more, I would have found it. But my desire to save time and efforts made sure that it found me.

Here are some technical details.

What is om_transliterator?

om_transliterator is a Python package for transliterating text from Kannada (Knda) Unicode script to Latin (Latn) script as per ISO 15919.

How to install om_transliterator?

To install om_transliterator with pip, run:

pip install om-transliterator

To install om_transliterator from source, first, clone the repository and then run:

python setup.py install

How to use om_transliterator?

Remember, om_transliterator is not an end tool. It’s just a simple lightweight Python package that can be used in your own project intended for use in the backend. Here’s an example of typical usage:

#!/usr/bin/env python
from om_transliterator import Transliterator
transliterator = Transliterator()
original_text = "ನೀ ಮಾಯೆಯೊಳಗೋ ನಿನ್ನೊಳು ಮಾಯೆಯೋ"
transliterated_text = transliterator.knda_to_latn(original_text)
# output: nī māyeyoḷagō ninnoḷu māyeyō

How to contribute?

This project is open-sourced under Apache License 2.0 with the help of contributions and support from people like Dinesh Shenoy, Prof. Purushothama Bilimale, Srikanth Lakshmanan, Sandesh B Suvarna, Arjun Shetty and many others.

The package is available at PyPI at https://pypi.org/project/om-transliterator/ . You can also contribute with extensions and fixes at https://github.com/shrirambhat/om_transliterator .

Let’s wait and watch out for what would find me next…

Up Next: Identity