Fwd: Re: libpsl 0.5.0 released

5 views
Skip to first unread message

Rockdaboot

unread,
Jul 3, 2014, 3:42:31 PM7/3/14
to libps...@googlegroups.com
---------- Weitergeleitete Nachricht ----------
Von: Daniel Kahn Gillmor <d...@fifthhorseman.net>
Datum: 03.07.2014 18:36
Betreff: Re: libpsl 0.5.0 released
An: Tim Ruehsen <rockd...@gmail.com>
Cc:

> (replying offlist, since your message to me was offlist; i have no
> objection to any of this being posted publicly)
>
> On 07/03/2014 12:17 PM, Tim Ruehsen wrote:
> > I would like to use libicu runtime since it has the most complete features
> > and seems pretty common on a standard debian install.
> > The builtin library doesn't matter... All 3 libs generate the same data
> > currently.
> > Whar do you think ?
>
>
> in debian, the packages have these priorities:
>
> 0 dkg@alice:~$ for x in libidn11 libidn2-0 libicu52; do printf "%s " $x;
> apt-cache show $x | grep Priority | head -n1; done
> libidn11 Priority: standard
> libidn2-0 Priority: extra
> libicu52 Priority: optional
> 0 dkg@alice:~$
>
> the priority ordering goes:
> required > important > standard > optional > extra
> see:
> https://www.debian.org/doc/debian-policy/ch-archive.html#s-priorities
>
> So libidn is most likely to be available everywhere, but if you think
> the featureset for libicu will be superior, i have no problem going with
> that (i confess i don't understand the specific tradeoffs).
>
> I definitely think it would be wise to use the same library for runtime
> as for builtin, though, as fun as it would be to learn about library
> incompatibilities that way :)
>
> --dkg
>

signature.asc

Tim Ruehsen

unread,
Jul 4, 2014, 10:24:59 AM7/4/14
to libps...@googlegroups.com, Daniel Kahn Gillmor
On Thursday 03 July 2014 21:38:56 Rockdaboot wrote:
> ---------- Weitergeleitete Nachricht ----------
> Von: Daniel Kahn Gillmor <d...@fifthhorseman.net>
> Datum: 03.07.2014 18:36
> Betreff: Re: libpsl 0.5.0 released
> An: Tim Ruehsen <rockd...@gmail.com>
>
> Cc:
> > (replying offlist, since your message to me was offlist; i have no
> > objection to any of this being posted publicly)

Me fighting with the mobile client ;-)

> > On 07/03/2014 12:17 PM, Tim Ruehsen wrote:
> > > I would like to use libicu runtime since it has the most complete
> > > features
> > > and seems pretty common on a standard debian install.
> > > The builtin library doesn't matter... All 3 libs generate the same data
> > > currently.
> > > Whar do you think ?
> >
> > in debian, the packages have these priorities:
> >
> > 0 dkg@alice:~$ for x in libidn11 libidn2-0 libicu52; do printf "%s " $x;
> > apt-cache show $x | grep Priority | head -n1; done
> > libidn11 Priority: standard
> > libidn2-0 Priority: extra
> > libicu52 Priority: optional
> > 0 dkg@alice:~$
> >
> > the priority ordering goes:
> > required > important > standard > optional > extra
> > see:
> > https://www.debian.org/doc/debian-policy/ch-archive.html#s-priorities
> >
> > So libidn is most likely to be available everywhere, but if you think
> > the featureset for libicu will be superior, i have no problem going with
> > that (i confess i don't understand the specific tradeoffs).

$ apt-rdepends --reverse --follow=Depends libidn2-0 2>/dev/null | awk '/^[^
]/' | wc -l
4

$ apt-rdepends --reverse --follow=Depends libidn11 2>/dev/null | awk '/^[^ ]/'
| wc -l
3767

$ apt-rdepends --reverse --follow=Depends libicu52 2>/dev/null | awk '/^[^ ]/'
| wc -l
2558

libidn11 is tiny in size, but IDNA2003 very outdated.

libidn2-0 is also tiny in size, but IDNA2008 outdated / has incompatibilties
with IDNA2003. Not used at all by any Debian package (except the idn2 and
libidn2-0* packages).

libicu is huge, but has IDNA2008 UTS#46 (TR46). It also has iconv
functionality built in, also casing.

Using iconv() from the libc6 will use (more or less) the same data tables as
libicu does (found on amd64 in /usr/lib/x86_64-linux-gnu/gconv/).

I have to read more about C90 standard unicode conversion. Those functions
always use the current locale/encoding - not sure how this can be handled in
multi threaded code (when threads need to handle a different encodings at the
same time). Especially when the input comes from different sources with
different encodings. I guess multithreading was not in the minds in the
1980ies. But it would be cool to have psl_str_to_utf8lower() just using
standard functions.

> > I definitely think it would be wise to use the same library for runtime
> > as for builtin, though, as fun as it would be to learn about library
> > incompatibilities that way :)

Right now the PSL contains data that leads to exactly the same output for
libidn, libidn2 and libicu. That might change when the PSL becomes extended.
I configured Travis CI to check all combinations of libraries for runtime and
builtin - I should also extend test-is-public-all.c to test each PSL file
entry against the builtin data.

With my current knowledge, I would choose libicu for libpsl Debian packages.


### some measurements nobody asked for ###

Some performance measurements (instruction cycles on a 3.1GHz i3 sandy brigde)
with (psl is using the built-in psl data for lookup, but calls
psl_str_to_utf8lower() once for 'Übel.com'):

$ LD_LIBRARY_PATH=/usr/oms/src/libpsl/src/.libs valgrind --tool=callgrind
tools/.libs/psl Übel.com

runtime with libicu: 1,992,663
runtime with libidn2: 385.056
runtime with libidn: 411.543

The libpsl code itself (psl_is_public) takes <1% of the instructions,
psl_str_to_utf8lower() is not even shown by kcachegrind.
setlocale() takes 85k, the remaining cycles are due to library loading.


Another test for IDNA library performance - instead of using the built-in
data, we load the current PSL file:
LD_LIBRARY_PATH=/usr/oms/src/libpsl/src/.libs valgrind --tool=callgrind
tools/.libs/psl --load-psl-file data/effective_tld_names.dat Übel.com

libicu: 13,594,978 (just 895k instructions for 292x punycode conversions*)
idn2: 29,222,281 (17.080k instructions for 292x punycode conversions*)
idn: 38,991,549 (28.001k instructions for 292x punycode conversions*)
(*in the current PSL data there are 292 non-ASCII domain names)

Conclusion: Libicu adds ~1.900k instructions for library loading but does
punycode conversion 31x faster than libidn and still 19x faster than libidn2.

So if speed and/or functionality matters, libicu seems unbeaten.


Tim

Reply all
Reply to author
Forward
0 new messages