Unicode support (fork)

1,091 views
Skip to first unread message

Koen Deforche

unread,
Apr 11, 2011, 6:40:15 AM4/11/11
to libHaru
Hey all,

For those that need unicode support in libharu, I've pushed my work to github:
https://github.com/kdeforche/libharu/tree/

This fork contains:
- full support for unicode (16 bit) for true type fonts, using UTF-8
encoding for the text functions in the libharu API.
- improved true type font support, supporting more true type fonts

The unicode support is now complete in the sense that also text
selection works again, and the generated PDFs open without errors in
all PDF readers and PDF tools I tested. For example:

An example of a generated file is:
http://www.emweb.be/public/haru/utf8.pdf (generated from
http://www.emweb.be/public/haru/utf8.html)

The fork also contains a number of other patches which were previously
submitted to libharu.
Hopefully these changes will get merged upstream.

Regards,
koen

Antony Dovgal

unread,
Apr 11, 2011, 7:05:04 AM4/11/11
to lib...@googlegroups.com
Hello Koen.

The example PDF does look impressive, thanks.
There are some things I'd like to discuss first, though.

1)
diff --git a/src/hpdf_font_cid.c b/src/hpdf_font_cid.c
index 7eac650..6d4d5b5 100644
--- a/src/hpdf_font_cid.c
+++ b/src/hpdf_font_cid.c
@@ -789,7 +789,7 @@ UINT16ToHex (char *s,

*s++ = '<';

- if (b[0] != 0) {
+ if (1 || b[0] != 0) {

This line looks wrong, doesn't it?
I see there is some explanation right above, but this looks more likea hack than a solution..

2) Would you like to take over Libharu as a project? =)


--
Wbr,
Antony Dovgal
---
http://pinba.org - realtime statistics for PHP

Koen Deforche

unread,
Apr 11, 2011, 8:22:26 AM4/11/11
to lib...@googlegroups.com
Hey Antony,

2011/4/11 Antony Dovgal <to...@daylessday.org>:


> 1)
> diff --git a/src/hpdf_font_cid.c b/src/hpdf_font_cid.c
> index 7eac650..6d4d5b5 100644
> --- a/src/hpdf_font_cid.c
> +++ b/src/hpdf_font_cid.c
> @@ -789,7 +789,7 @@ UINT16ToHex  (char     *s,
>
>     *s++ = '<';
>
> -    if (b[0] != 0) {
> +    if (1 || b[0] != 0) {
>
> This line looks wrong, doesn't it?
> I see there is some explanation right above, but this looks more likea hack
> than a solution..

Yes. It is something that needs to be fixed, as I might guess it
impacts other encodings. It probably needs to be fixed by taking into
account the range and decide based on the largest number how to pad
the two numbers in a range.

To be honest, I stopped with the patch to the point where the unicode
support worked well, but indeed have not looked too much at whether
the patch broke other encodings. Would the included tests help to
conclude that it does/does not work for other encoders ?

> 2) Would you like to take over Libharu as a project? =)

It isn't my ambition, no, I lack basic knowledge on the PDF format,
and also lack time, and my needs for libharu are also quite specific
so I would also lack the scratch-your-own-itch drive for many parts of
the library.

But because libharu is a mature library as it is, it could be useful
to open up its maintenance to more than one person? In times of git
and github, this is more easy than ever. For this to work, the wiki
and bugs would also need to move to github. I wouldn't mind being a
committer and take responsibility of specific parts of libharu.

Regards,
koen

Antony Dovgal

unread,
Apr 11, 2011, 8:32:21 AM4/11/11
to lib...@googlegroups.com
On 04/11/2011 04:22 PM, Koen Deforche wrote:
> To be honest, I stopped with the patch to the point where the unicode
> support worked well, but indeed have not looked too much at whether
> the patch broke other encodings. Would the included tests help to
> conclude that it does/does not work for other encoders ?

Probably.
I'll need to take a look at it first, though..

>> 2) Would you like to take over Libharu as a project? =)
>
> It isn't my ambition, no, I lack basic knowledge on the PDF format,
> and also lack time, and my needs for libharu are also quite specific
> so I would also lack the scratch-your-own-itch drive for many parts of
> the library.

Fair enough.



> But because libharu is a mature library as it is, it could be useful
> to open up its maintenance to more than one person? In times of git
> and github, this is more easy than ever. For this to work, the wiki
> and bugs would also need to move to github. I wouldn't mind being a
> committer and take responsibility of specific parts of libharu.

Not sure how to move wiki and bugs (are there any tools to automate that?),
but that shouldn't prevent you from committing, right?

I just added you as a collaborator on Github, is there anything I need to do next?
I'm not quite familiar with this part of Github functionality..

Koen Deforche

unread,
Apr 11, 2011, 10:38:16 AM4/11/11
to lib...@googlegroups.com
Hey Antony,

2011/4/11 Antony Dovgal <to...@daylessday.org>:


> On 04/11/2011 04:22 PM, Koen Deforche wrote:
>>
>> To be honest, I stopped with the patch to the point where the unicode
>> support worked well, but indeed have not looked too much at whether
>> the patch broke other encodings. Would the included tests help to
>> conclude that it does/does not work for other encoders ?
>
> Probably.
> I'll need to take a look at it first, though..
>
>>>  2) Would you like to take over Libharu as a project? =)
>>
>> It isn't my ambition, no, I lack basic knowledge on the PDF format,
>> and also lack time, and my needs for libharu are also quite specific
>> so I would also lack the scratch-your-own-itch drive for many parts of
>> the library.
>
> Fair enough.
>
>>
>> But because libharu is a mature library as it is, it could be useful
>> to open up its maintenance to more than one person? In times of git
>> and github, this is more easy than ever. For this to work, the wiki
>> and bugs would also need to move to github. I wouldn't mind being a
>> committer and take responsibility of specific parts of libharu.
>
> Not sure how to move wiki and bugs (are there any tools to automate that?),
> but that shouldn't prevent you from committing, right?

We have previously migrated from traditional wiki to redmine wiki with
a script that we found somewhere. Perhaps there are already such
scripts for traditional wiki to github wiki conversion ? The wiki
needs a number of updates, related to my modifications, but also other
things which were not accurate/incomplete in the (otherwise very
useful) documentation.

It is not an urgent issue, but I've also seen others recently ask for
this in this group.

> I just added you as a collaborator on Github, is there anything I need to do
> next?
> I'm not quite familiar with this part of Github functionality..

Neither am I, I've just signed up for it, and I used haru as an
occasion to see how it works. I would first like to see that the other
encodings still work (pending your advice there), and then I'll try if
I can merge my tree into yours.

Regards,
koen

Koen Deforche

unread,
Apr 12, 2011, 3:58:25 AM4/12/11
to lib...@googlegroups.com
Hey Antony,

2011/4/11 Antony Dovgal <to...@daylessday.org>:


> On 04/11/2011 04:22 PM, Koen Deforche wrote:
>>
>> To be honest, I stopped with the patch to the point where the unicode
>> support worked well, but indeed have not looked too much at whether
>> the patch broke other encodings. Would the included tests help to
>> conclude that it does/does not work for other encoders ?
>
> Probably.
> I'll need to take a look at it first, though..

I tried yesterday. The ttfont_demo seems like one that should be
interesting, but I cannot locate a truetype font file that makes it
work (even with the libharu master branch). Do you have a font file
that you can use to reproduce the included demo pdf ?

Regards,
kone

Antony Dovgal

unread,
Apr 12, 2011, 4:20:15 AM4/12/11
to lib...@googlegroups.com
On 04/12/2011 11:58 AM, Koen Deforche wrote:
> I tried yesterday. The ttfont_demo seems like one that should be
> interesting, but I cannot locate a truetype font file that makes it
> work (even with the libharu master branch). Do you have a font file
> that you can use to reproduce the included demo pdf ?

It works just fine with any TTF font I tried.
This is what I get with Verdana, for example:
http://dev.daylessday.org/diff/ttfont_demo.pdf

And it works both with your branch and mine =)

Laurent Humbertclaude

unread,
Apr 12, 2011, 4:47:56 AM4/12/11
to lib...@googlegroups.com
Hi,

> The unicode support is now complete in the sense that also text
> selection works again, and the generated PDFs open without errors in
> all PDF readers and PDF tools I tested. For example:

Not sure if this is a problem of your patch but there is a problem for
the RTL sentences when displayed with Evince 2.30.3 (in Ubuntu 10.04
LTS). The RTL sentences starts with 2 empty squares, like a missing
character.
I have not checked if there is bug report on this issue.

> An example of a generated file is:
> http://www.emweb.be/public/haru/utf8.pdf (generated from
> http://www.emweb.be/public/haru/utf8.html)

Anyway, great work !
Regards,

Laurent

Antony Dovgal

unread,
Apr 12, 2011, 4:57:08 AM4/12/11
to lib...@googlegroups.com
On 04/12/2011 12:47 PM, Laurent Humbertclaude wrote:
> Hi,
>
>> The unicode support is now complete in the sense that also text
>> selection works again, and the generated PDFs open without errors in
>> all PDF readers and PDF tools I tested. For example:
>
> Not sure if this is a problem of your patch but there is a problem for
> the RTL sentences when displayed with Evince 2.30.3 (in Ubuntu 10.04
> LTS). The RTL sentences starts with 2 empty squares, like a missing
> character.

Hmm..
I can't see anything like that in Okular and Acrobat Reader, but I do see those squares in Evince, yes.
Other viewers seem to be displaying them as white spaces.

Antony Dovgal

unread,
Apr 12, 2011, 5:01:43 AM4/12/11
to lib...@googlegroups.com
On 04/12/2011 12:57 PM, Antony Dovgal wrote:
> Hmm..
> I can't see anything like that in Okular and Acrobat Reader, but I do see those squares in Evince, yes.
> Other viewers seem to be displaying them as white spaces.
>

Here is what I'm talking about (Evince on the left, Okular on the right):
http://dev.daylessday.org/d/conf3.png

Laurent Humbertclaude

unread,
Apr 12, 2011, 5:47:30 AM4/12/11
to lib...@googlegroups.com
2011/4/12 Antony Dovgal <to...@daylessday.org>:

> On 04/12/2011 12:57 PM, Antony Dovgal wrote:
>>
>> Hmm..
>> I can't see anything like that in Okular and Acrobat Reader, but I do see
>> those squares in Evince, yes.
>> Other viewers seem to be displaying them as white spaces.
>>
>
> Here is what I'm talking about (Evince on the left, Okular on the right):
> http://dev.daylessday.org/d/conf3.png
>

Here is what I see,
http://img829.imageshack.us/img829/2945/utf8pdf2.png
On the left is the chromium rendering of the html file (but it is the
same in firefox 4). Look at the characters, they are all different,
seems like they are reversed.

Regards,
--
Laurent

Koen Deforche

unread,
Apr 12, 2011, 8:55:50 AM4/12/11
to lib...@googlegroups.com
Hey Antony,

2011/4/12 Antony Dovgal <to...@daylessday.org>:


> On 04/12/2011 11:58 AM, Koen Deforche wrote:
>>
>> I tried yesterday. The ttfont_demo seems like one that should be
>> interesting, but I cannot locate a truetype font file that makes it
>> work (even with the libharu master branch). Do you have a font file
>> that you can use to reproduce the included demo pdf ?
>
> It works just fine with any TTF font I tried.
> This is what I get with Verdana, for example:
> http://dev.daylessday.org/diff/ttfont_demo.pdf
>
> And it works both with your branch and mine =)

Oops, sorry, I meant the ttffont_demo_jp. That one seems to use
another complex encoding which could be affected by the changes I
made.

Regards,
koen

Koen Deforche

unread,
Apr 12, 2011, 9:00:36 AM4/12/11
to lib...@googlegroups.com
Hey Laurent,

2011/4/12 Laurent Humbertclaude <laurent.hu...@gmail.com>:

Good catch, I hadn't noticed the reversal.

Does libharu have support for bidi (RTL) text rendering?

I'm not sure how Right-to-Left text is supposed to work with PDF, but
I suspect in this case that even if libharu already supports that, the
code used for rendering this example doesn't handle Right-to-Left
text.

So, I'm not sure if libharu is to blame for this rendering problem (as
much as Wt::WTextRender is in this case).

Regards,
koen

Antony Dovgal

unread,
Apr 12, 2011, 9:05:53 AM4/12/11
to lib...@googlegroups.com
On 04/12/2011 04:55 PM, Koen Deforche wrote:
> Oops, sorry, I meant the ttffont_demo_jp. That one seems to use
> another complex encoding which could be affected by the changes I
> made.

It works with Sazanami Gothic and some other fonts, like the bundled on in demo/ttfont directory.
In both branches.

http://dev.daylessday.org/diff/ttfont_demo_jp.pdf

Koen Deforche

unread,
Apr 12, 2011, 9:26:14 AM4/12/11
to lib...@googlegroups.com
Hey Antony,

2011/4/12 Antony Dovgal <to...@daylessday.org>:


> On 04/12/2011 04:55 PM, Koen Deforche wrote:
>>
>> Oops, sorry, I meant the ttffont_demo_jp. That one seems to use
>> another complex encoding which could be affected by the changes I
>> made.
>
> It works with Sazanami Gothic and some other fonts, like the bundled on in
> demo/ttfont directory.
> In both branches.
>
> http://dev.daylessday.org/diff/ttfont_demo_jp.pdf

I can't get that file to open properly on any of my systems, but the
ttfont_demo_jp.pdf file that is in the repo displays fine using Adobe.
But so, I seem not to be able to generate a PDF that displays properly
even using the master version of libharu, thus I'm not invoking the
ttfont_demo_jp application properly ?

Adobe Reader on MacOSX asked me to install the Japanese fonts, but
that didn't help.

Regards,
koen

HAYASHI Kentaro

unread,
Apr 13, 2011, 12:33:14 AM4/13/11
to lib...@googlegroups.com
Hello,

It seems that ttfont_demo_jp.pdf in master repo was generated with -E option.
then, at least, it is ok for me (master and 2.2.1, not forked one)

> --
> ---
> libHaru.org development mailing list
> To unsubscribe, send email to libharu-u...@googlegroups.com

--
HAYASHI Kentaro <ken...@gmail.com>

Koen Deforche

unread,
Apr 13, 2011, 4:47:34 AM4/13/11
to lib...@googlegroups.com
Hey,

2011/4/13 HAYASHI Kentaro <ken...@gmail.com>:


> Hello,
>
> It seems that ttfont_demo_jp.pdf in master repo was generated with -E option.
> then, at least, it is ok for me (master and 2.2.1, not forked one)

Thanks for the hint.
I could reproduce it now too (for the record, using kochi-gothic.ttf)
and the latest push to the forked branch fixes the japanese encoding
regression.

Regards,
koen

Manoj

unread,
Aug 8, 2017, 6:15:48 AM8/8/17
to libHaru
Hi,
Koen your github link is not working please update exact one

Manoj

unread,
Aug 8, 2017, 6:15:48 AM8/8/17
to libHaru
Hi,
 Koen glad to see your support in libharu unicode, i am also working on it but in microcontroller level.
I am using libharu 2.4-dev,

 HPDF_UseUTFEncodings( Pdf );
 fontname = HPDF_LoadTTFontFromFile(Pdf, "a:couri.ttf", HPDF_TRUE);
 font = HPDF_GetFont(Pdf, fontname, "UTF-8");

HPDF_LoadTTFontFromFile(Pdf, "a:couri.ttf", HPDF_TRUE); function is terminating while reading parameter and encoding fron .ttf file

This code segment not executing completely

Please guide me to fix the issue, thanks
On Monday, April 11, 2011 at 4:10:15 PM UTC+5:30, ko...@emweb.be wrote:
Reply all
Reply to author
Forward
0 new messages