UTF-8 support

41 views
Skip to first unread message

Delusional Insanity

unread,
Jan 15, 2023, 5:14:21 PM1/15/23
to reportlab-users
I can't create a PDF from UTF-8 encoded text. I get a document full of black squares (see the attachment).

My code:

```python

import tempfile

from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
from reportlab.lib.pagesizes import letter

from faker import Faker

FAKER = Faker(locale="hy-AM")
text = FAKER_HY.text(max_nb_chars=10_000)
text = text.replace("\n", " ")

pdf_name_reportlab = tempfile.NamedTemporaryFile(prefix="reportlab_", suffix=".pdf", dir="/tmp/tmp/").name

def generate_pdf_reportlab():
    styles = getSampleStyleSheet()
    style_paragraph = styles['Normal']
    story = []

    doc = SimpleDocTemplate(
        pdf_name_reportlab,
        pagesize=letter,
        bottomMargin=.4 * inch,
        topMargin=.6 * inch,
        rightMargin=.8 * inch,
        leftMargin=.8 * inch
    )

    paragraph = Paragraph(text, style_paragraph)
    story.append(paragraph)
    doc.build(story)

generate_pdf_reportlab()
```

I also tried TTF font (Vera) but it didn't work either.

```python

from reportlab.pdfbase import pdfmetrics

pdfmetrics.registerFont(TTFont('Vera', 'Vera.ttf'))
pdfmetrics.registerFont(TTFont('VeraBd', 'VeraBd.ttf'))
pdfmetrics.registerFont(TTFont('VeraIt', 'VeraIt.ttf'))
pdfmetrics.registerFont(TTFont('VeraBI', 'VeraBI.ttf'))

doc = SimpleDocTemplate(
    buffer,
    pagesize=letter,
    bottomMargin=.4 * inch,
    topMargin=.6 * inch,
    rightMargin=.8 * inch,
    leftMargin=.8 * inch,
    initialFontName="Vera"
)
```

Screenshot from 2023-01-15 23-12-02.png

Matthias Kreier

unread,
May 21, 2024, 3:03:36 AMMay 21
to reportlab-users
This looks like a problem of the font you are using, not specifically of utf-8. If you copy/paste some of these black blocks into Word or Google Docs you will see that the correct text is there. The problem is (probably) that the font you are using does not contain the glyphs for the language you are using. In your code I found  FAKER = Faker(locale="hy-AM")   which implies you have a text in the Armenian language. The Armenian glyphs are not part of the Vera font. Try to use a font that has the Armenian glyphs and I guess you will be able to see the whole text.

A font that has Armenian glyphs is this one: https://fonts.google.com/noto/specimen/Noto+Sans+Armenian 

Hope it helps.

Matthias

Reply all
Reply to author
Forward
0 new messages