[reportlab-users] Issues rendering Burmese/Myanmar in PDFs

21 views
Skip to first unread message

Bobby Hagen

unread,
Jan 7, 2021, 10:54:06 AM1/7/21
to reportl...@lists2.reportlab.com

Previously posted on the google group, but missing images.  Hopefully those come through this time. Also clarified a couple steps and wanted to follow up on this via the list, just in case, as we're still seeing this, even using the latest reportlab-3.5.59 available via pypi:

We're encountering some problems attempting to build PDFs using Burmese.  The problem doesn't seem restricted to a specific font, as at least NotoSansMyanmar (https://www.google.com/get/noto/#sans-mymr) and Padauk (https://fonts.google.com/specimen/Padauk) both have these issues.  Although independent characters seem to be represented correctly, some of the vowels and medials (and perhaps other items as well) seem to have some rendering issues.

The issue we originally ran into was that the  U+103C MYANMAR CONSONANT SIGN MEDIAL RA was misplaced, seeming to be placed following the consonant instead of decorating it, as per http://unicode.org/notes/tn11/UTN11_3.pdf.  As a simple example, the string "ကြ" (represented by code points U+1000 U+103c), would appear as two separate characters "က ြ" instead of being combined. If there was a character following that first "က", " ြ" would incorrectly decorate the second character instead.

An example snippet, showing this behavior, which should show "ကြေး", but seems to place the medial and vowel incorrectly:

# coding=utf-8
def main():
    from reportlab.pdfgen.canvas import Canvas
    from reportlab.pdfbase.ttfonts import TTFont
    from reportlab.pdfbase.pdfmetrics import registerFont
    from reportlab.lib.pagesizes import A10

    canv = Canvas('text-on-image.pdf',pagesize=A10)
    registerFont(TTFont('notosans','NotoSansMyanmar-Regular.ttf'))

    canv.setFont('notosans',14)
    canv.setFillColor((1,0,0)) #change the text color
    text = u"\u1000\u103C\u1031\u1038"
    canv.drawString(25,50,text)
    canv.save()

if __name__=='__main__':
    main()

This yields the following:

image.png

I've used the code points for it in the snippet, but you can see from http://zawgyi-unicode-test.appspot.com/convertui/ how that should render:

image.png

And replacing with the actual characters in the code snippet doesn't fix the issue.

Robin Becker

unread,
Jan 7, 2021, 4:52:27 PM1/7/21
to reportlab-users, Bobby Hagen
Not sure exactly what the google group is, but here is where we try and answer questions.

I'm not an expert, but I suspect this is the same issue we have had in the past with Thai diacritics etc etc.

The issue may be caused by the way the TTF is embedded in the PDF. To avoid very large PDF's we embed subsets; that may
alter the way that the text is represented in the PDF content stream. additionally we may be losing something in the
font data that's important for positioning in particular vertical metrics. I think the original contribution was for
Latin type fonts where that information is not required.

I think PDF can represent these glyphs properly, but then the renderer application also needs to do the right thing.

To fix this we would need to find a different way to use the font in the PDF to avoid the problems we have in our ttf
subset algorithm.


On 07/01/2021 15:52, Bobby Hagen wrote:
> Previously posted on the google group, but missing images. Hopefully those
> come through this time. Also clarified a couple steps and wanted to follow
> up on this via the list, just in case, as we're still seeing this, even
> using the latest reportlab-3.5.59 available via pypi:
>
> We're encountering some problems attempting to build PDFs using Burmese.
> The problem doesn't seem restricted to a specific font, as at least
> NotoSansMyanmar (https://www.google.com/get/noto/#sans-mymr) and Padauk (
> https://fonts.google.com/specimen/Padauk) both have these issues. Although
> independent characters seem to be represented correctly, some of the vowels
> and medials (and perhaps other items as well) seem to have some rendering
> issues.
...........
--
Robin Becker
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
https://pairlist2.pair.net/mailman/listinfo/reportlab-users

Robin Becker

unread,
Jan 8, 2021, 7:15:31 AM1/8/21
to reportlab-users, Bobby Hagen
On 07/01/2021 15:52, Bobby Hagen wrote:
> Previously posted on the google group, but missing images. Hopefully those
> come through this time. Also clarified a couple steps and wanted to follow
> up on this via the list, just in case, as we're still seeing this, even
> using the latest reportlab-3.5.59 available via pypi:
.......

I tried creating a small test PDF with libreoffice calc so I can see how they are embedding and so forth.

Can you say if this is correct I entered the characters as per your text example ie 1000 103C 1031 1038 and then
exported to pdf.

--
Robin Becker
burmese-test.pdf

Anurag Bansal

unread,
Mar 23, 2022, 4:27:07 PMMar 23
to reportlab-users
Hello,

Is their a working solution for this issue? I am also facing a similar issue in different Indic languages where the language does exhibit this kind of nature.

If not, are there any suggestions in other libraries (in python), which possibly solve this issue? So far, I am aware that PIL can print the intended text in correct way, but then again I can't use it for PDF creation (right?).

Anurag Bansal
Reply all
Reply to author
Forward
0 new messages