Reportlab adding/changing characters to Hindi words

356 views
Skip to first unread message

Steve Young

unread,
Oct 28, 2014, 9:43:07 PM10/28/14
to reportl...@googlegroups.com
I am making progress using different fonts to output various languages correctly.  I just encountered a strange error with Hindi and I am not sure how to investigate what is wrong.

The problem is on some of the words reportlab is adding or swapping a character.  For example:

import reportlab
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfgen.canvas import Canvas
pdfmetrics.registerFont(TTFont('FreeSans', 'FreeSans.ttf'))

a = 'किताब/पुस्तक'

c = canvas.Canvas('hindi_error.pdf')
c.setFont('FreeSans', 32)
c.drawString(1*inch, 8*inch, card1)
c.showPage()
c.save()

The resulting pdf is attached, and the text is:

As you can see a character has been added to the beginning of the word.  I have tried a couple of fonts but get the same results in each.  Any ideas on what steps I should take to figure out what is happening?

Thank you.


hindi_error.pdf

Steve Young

unread,
Oct 29, 2014, 6:02:26 PM10/29/14
to reportl...@googlegroups.com
Hindi has some interesting characters that morph into new ones when certain letters are used together:

Special characters

In some words, written vowels change their form in order to join up with consonants.

- With ‘i’: ि – कि [ki] is a combination of क + इ (k + i). The character ि is added to the left and above.

There are about a dozen of these rules. This is the one that explains what is going on in the above example.  When typing the Hindi word किताब ('book' in English) you press क then  ि and the result is कि.  

Reportlabs does not seem to know the rule, and prints  ि.

Back in the console:

>>> bytes('क', 'utf-8')
b'\xe0\xa4\x95'

>>> bytes('कि', 'utf-8')
b'\xe0\xa4\x95\xe0\xa4\xbf'

The character order shows the क first, and the  ि second. So does the computer OS or other layer know the rules for Hindi and automatically change the characters? And reportlab does not take notice of this?

Is there a work around or I am unable to print Hindi in reportlab?

Thanks again.

Steve Young

unread,
Oct 29, 2014, 8:50:52 PM10/29/14
to reportl...@googlegroups.com
Seems that around 2006 most major operating systems included the ability to correctly input render Hindi and other southeast Asian languages, including the dependent vowels and other irregularities.

I see 3 options:
  1. Patch reportlab to use the OS's built-in ability to correctly deal with Hindi.
  2. Add a Hindi patch to reportlab with the rules to properly deal with Hindi.
  3. Look for another solution (such as creating the pages in html, doing screenshots, using the images in the pdf...)
Any suggestions or other ideas are welcome. 

Thank you.
Reply all
Reply to author
Forward
0 new messages