Problem with utf-8 encoding

3,664 views
Skip to first unread message

Kudra

unread,
Apr 2, 2010, 10:52:26 AM4/2/10
to Pisa XHTML2PDF Support
Hi guys,

I have the following problem: I want to create a PDF in Hungarian
language. I want to generate the pdf file with pisa. I have managed to
generate a raw PDF but with some errors, it is not generating the
hungarian characters: ő ű Ő Ű (UTF-8 character encoding).

here is my python code:

context = Context({'object': object, 'LANGUAGE_CODE': language})
html = template.render(context)
# Insert page skips
html = html.replace('-pageskip-', '<pdf:nextpage />')

# Print the output in a html file for debugging
f = open('pdf.html','w')
print >>f, html.encode("UTF-8")

result = StringIO.StringIO()
pdf = pisa.pisaDocument(StringIO.StringIO(
html.encode("UTF-8")), result, link_callback=fetch_resources,
encoding="utf-8", xhtml=True)

if not pdf.err:
object.add_copy()
return result.getvalue()
else:
return False

And here is how my template file begins:

<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<style>
@font-face {
font-family: "arial";
src: url("http://www.toolpart.hu/static/media/medialibrary/2010/04/
arial.ttf");
}

html {
font-family: "arial";
}

...

Arial font-family supports these characters, I checked the pdf.html
file and I can see the ő Ő ű Ű characters in it, so the characters are
there.

What do you think, what can be the solution? Thanks for your help in
advance.

Regards,
Laszlo

David Bolton

unread,
Apr 3, 2010, 2:34:00 AM4/3/10
to xhtm...@googlegroups.com
On 4/2/2010 9:52 AM, Kudra wrote:
> Hi guys,
>
> I have the following problem: I want to create a PDF in Hungarian
> language. I want to generate the pdf file with pisa. I have managed to
> generate a raw PDF but with some errors, it is not generating the
> hungarian characters: ő ű Ő Ű (UTF-8 character encoding).
>
>

If it helps to have a reference script that is working see:
http://mscore.svn.sourceforge.net/viewvc/mscore/trunk/mscore/manual/

I am using pisa to create PDF's in Hungarian (among other languages).


> ...
>
> Arial font-family supports these characters, I checked the pdf.html
> file and I can see the ő Ő ű Ű characters in it, so the characters are
> there.
>

Just so that you are aware, Firefox and webkit browsers substitute
characters from other fonts if it can't find the character in the
specified font so looking at the text in the browser is not a reliable
method of testing whether the font includes a special character.
Internet Explorer shows a square if it can't find the character in the
specified font.

David

Kudra

unread,
Apr 3, 2010, 11:37:13 AM4/3/10
to Pisa XHTML2PDF Support
Hi,

Thanks for the advices.

I have checked the generated html file in Explorer and the characters
are fine, however in the generated pdf it is still wrong. I want to
use UTF-8 because I have text in Hungarian and Turkish language.

In the python code the pdf generating line is the following:

pdf = pisa.pisaDocument(StringIO.StringIO(
html.encode("UTF-8")), result, link_callback=fetch_resources,
encoding="utf-8", xhtml=True)

In the template file the coresponding files are:

{% load i18n %}<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style>
@font-face {
font-family: "DejaVuSans";
src: url("http://www.toolpart.hu/static/media/medialibrary/2010/04/
DejaVuSans.ttf");
}

html {
font-family: "DejaVuSans";
}

Now I use the DejaVuSans.ttf what you also use in the project you have
mentioned in your mail. Do you have any other idea what can cause the
problem?

Regards,
Laszlo

fox1986

unread,
Apr 9, 2010, 6:37:52 AM4/9/10
to Pisa XHTML2PDF Support
I have exactly the same problem here with Polish and Russian chars. My
HTML is 100% UTF-8 legit and looks good, however when I try to make a
PDF with Pisa out of it, some Polish chars are replaced by black
rectangles, the russian chars are all black.

I tried several times to change fonts: Arial Unicode MS and now
dejaVuSans, like in the pisa documentation. Unfortuantly it seems it
has no effect. The PDF always tries to use Helvetica (on Mac) or Arial
(Windows). Is there no chance to change the font? The documentation
(font-face etc..) does not work.

Please help, I'm getting crazy :-(

David Bolton

unread,
Apr 9, 2010, 10:51:43 AM4/9/10
to xhtm...@googlegroups.com
A useful debugging process is to compare your code to mine (linked
below) and step-by-step change the reference version to match yours
until the reference version no longer works. Then you will know what
difference is causing the problem.

One difference you might start with is you are using HTTP for the font
urls. I am using relative url's that point to the local disk. Once you
find out what the problem is, share what you discovered with the list.

David

fox1986

unread,
Apr 12, 2010, 3:09:05 AM4/12/10
to Pisa XHTML2PDF Support
I tried it in the way you did, but it seems the font is always
ignored.

===============================================
My css (I tried with other fonts aswell, they are also ignored):
===============================================
/* Normal */
@font-face {
font-family: "DejaVu LGC Sans";
src: url(font/DejaVuLGCSans.ttf);
}


@page {
size: a4;
margin-left: 2.5cm;
margin-right: 1.7cm;
margin-top: 2.5cm;
margin-bottom: 4.0cm;

@frame footer {
-pdf-frame-content: footerContent;
margin-left: 2.5cm;
margin-right: 1.7cm;
bottom: 1.0cm;
height: 2.5cm;
}
}

body {
font-family:"DejaVu LGC Sans", serif;
}

===================================
My python code ("code" is my html-code to render")
===================================
x = open(dest, "wb")
code = code.encode('UTF-8')
pdf = pisa.CreatePDF(StringIO.StringIO(code),x,
encoding='UTF-8')
x.close()

I can do whatever I want, the PDF is always redered with the default
font: Helvetica (Mac) or Arial (Windows). What am I doing wrong?

fox1986

unread,
Apr 12, 2010, 3:33:21 AM4/12/10
to Pisa XHTML2PDF Support
Ok, I found the reason now:

For some annoying reason, font's given with relative path are simply
ignored.
So "src: url(font/DejaVu Sans.ttf)" would not work. I reall hope there
is a workaround.
Using absolute paths here is not very useful, especially not if you
wanna have multiple
instances of your application on different servers. Anyone an idea?

Kudra

unread,
Apr 13, 2010, 7:46:18 AM4/13/10
to Pisa XHTML2PDF Support
Hi,

Thanks for your help. My problem is solved now.

I have also tried it with

src: url( {{ MEDIA_URL }}fonts/DejaVuSans.ttf);

but it didn't worked this way.

Then I used absolute path (of django, not linux, so I think it should
be no problem if you are using your application on different servers).

@font-face {
font-family: "DejaVu";
src: url(/static/fonts/DejaVuSans.ttf);
}

where /static/ is my media root directory.

Hope it helps.

Regards,
Laszlo

Reply all
Reply to author
Forward
0 new messages