Unicode

573 views
Skip to first unread message

koen.d...@gmail.com

unread,
Jan 10, 2011, 6:24:43 AM1/10/11
to libHaru
Hey all,

(Like many others) we would like to be able to write out unicode text
using libharu. Unfortunately, it seems that recently a patch was
accepted which supports only 1-byte and 2-byte encoded characters
while at the same time internally libharu supports UCS-2. Supporting
only UCS-2 would be acceptible to us, but the current UTF8 encoder
does not support common characters such as the euro symbol (€) which
have a 3-byte encoding.

It seems that an encoder implementation is available as a file
contribution, that at least from a reading, does exactly what we want.
What is the reason that this implementation was not adopted? If it
doesn't work, is it because of a bug or a fundamental flaw?

http://libharu.googlegroups.com/web/hpdf_encoder_utf8.c

Another attempt to support unicode was using UTF-16 (or rather, UCS-2)
directly. It appears that this should be straight forward too, but,
this failed because StrLen() didn't consider the encoding of the
string to find the trailing '0'. Any ideas on whether this could be
easily fixed?

Regards,
koen

ko...@emweb.be

unread,
Jan 18, 2011, 5:14:46 PM1/18/11
to libHaru
Hey all,

On Jan 10, 12:24 pm, "k...@emweb.be" <koen.defor...@gmail.com> wrote:
> Hey all,
>
> (Like many others) we would like to be able to write out unicode text
> using libharu. Unfortunately, it seems that recently a patch was
> accepted which supports only 1-byte and 2-byte encoded characters
> while at the same time internally libharu supports UCS-2. Supporting
> only UCS-2 would be acceptible to us, but the current UTF8 encoder
> does not support common characters such as the euro symbol (€) which
> have a 3-byte encoding.

I've spent some time on this, and now have a patch ready which
supports one-to-three byte UTF8 codes (i.e. the unicode range U+0000
to U+FFFF). The patch essentially combines ideas from
hpdf_encoder_utf8.c into hpdf_encoder_utf.c, but uses UCS-2 encoding
in the PDF. It also gets rid of the big encoding tables in
hpdf_encoder_utf.c:

This is a git diffstat:

include/hpdf_encoder.h | 10 +-
src/hpdf_encoder.c | 38 +-
src/hpdf_encoder_utf.c | 2299 +++
+------------------------------------------
src/hpdf_font_cid.c | 43 +-
src/hpdf_page_operator.c | 44 +-
5 files changed, 262 insertions(+), 2172 deletions(-)

If there is any interest in this patch, how can we contribute it?
Previously, I submitted a patch to the bug tracker (Id 0000029, for
HPDF_Page_Arc(), submitted 6 months ago), but that does not seem to be
the best way: it has been totally ignored so far.

Ideally, I would like to see both patches integrated into the official
libharu library.

Regards,
koen

Antony Dovgal

unread,
Jan 19, 2011, 7:22:44 AM1/19/11
to lib...@googlegroups.com
On 01/19/2011 01:14 AM, ko...@emweb.be wrote:
> If there is any interest in this patch, how can we contribute it?

You can just post in the list.
This way it won't be lost and everybody interested will have a chance to review it.

> Previously, I submitted a patch to the bug tracker (Id 0000029, for
> HPDF_Page_Arc(), submitted 6 months ago), but that does not seem to be
> the best way: it has been totally ignored so far.

Would be nice to have an example or two demonstrating what exactly you're trying to achieve with it.

--
Wbr,
Antony Dovgal
---
http://pinba.org - realtime statistics for PHP

Michail Vidiassov

unread,
Jan 19, 2011, 8:59:43 AM1/19/11
to lib...@googlegroups.com
Dear Antony,

you wrote:
>> Previously, I submitted a patch

> Would be nice to have an example or two demonstrating what exactly you're
> trying to achieve with it.

Did you get any examples when you commited
"Greatly improved U3D support (Nikhil Soman)"
to 2.2.0? Can you share them?

Sincerely, Michail


Koen Deforche

unread,
Jan 19, 2011, 9:04:46 AM1/19/11
to lib...@googlegroups.com
Hey Antony,

On Wed, Jan 19, 2011 at 1:22 PM, Antony Dovgal <to...@daylessday.org> wrote:
> On 01/19/2011 01:14 AM, ko...@emweb.be wrote:
>> If there is any interest in this patch, how can we contribute it?
>
> You can just post in the list.
> This way it won't be lost and everybody interested will have a chance to review it.

Please find both patches in attachment.

>> Previously, I submitted a patch to the bug tracker (Id 0000029, for
>> HPDF_Page_Arc(), submitted 6 months ago), but that does not seem to be
>> the best way: it has been totally ignored so far.
>
> Would be nice to have an example or two demonstrating what exactly you're trying to achieve with it.

It solves two limitations in libharu w.r.t. paths which contains arcs:

1) Currently, when you add an arc to a path, the current path is
broken: a 'm' instruction is rendered to move to the begin position of
the arc. This means that you cannot have an arc in the middle of a
path without breaking the path (and this affects filling behaviour).
This is solved by the 3rd hunk in the patch.

2) Secondly, currently you cannot add arcs with arbitrary spans in a
path: when you need a path which contains first a clock-wise arc (of
e.g. 90 degrees), then a line, and then the same arc
counter-clockwise, you cannot define this with libharu. Libharu always
assumes that ang2 was larger than ang1. This patch fixes this by
simply removing this restriction and also allows drawing arcs with
negative spans.

The two parts of the patch together make it that now you can render
and fill the SVG path in attachment. Previously this was impossible
(ASFAIK).

I hope this clarifies what we try to do.

Regards,
koen

path_arcs.svg
utf8.patch
HPDF_Page_Arc.patch
Reply all
Reply to author
Forward
0 new messages