thoughts on libpsl in debian

12 views
Skip to first unread message

Daniel Kahn Gillmor

unread,
Dec 11, 2015, 10:32:06 PM12/11/15
to libpsl development
Hi libpsl folks--

i had a couple recent thoughts about libpsl in debian in conversation
with pabs on IRC:

a) I think libpsl should Recommend: publicsuffix. the normal
installation should be to see that updated file. I'll probably go
ahead and make that change to the debian packaging shortly.

b) i think the main reason to not use an external file each time is the
speed of parsing. I also notice that upstream is working with a
DAWG/DAFSA approach borrowed from Chromium's analysis of gperf data.

I could have the publicsuffix package ship both a plaintext version
and a DAWG/DAFSA-compiled byte object, which libpsl could then mmap
and work from. Then we don't need a builtin list at all, we can
just Depends: publicsuffix (stronger than Recommends:) to keep the
byte object up-to-date.

What do you think about this approach?

--dkg

Tim Rühsen

unread,
Dec 12, 2015, 4:29:26 PM12/12/15
to libps...@googlegroups.com, Daniel Kahn Gillmor
Hi Daniel,

Am Freitag, 11. Dezember 2015, 22:32:03 schrieb Daniel Kahn Gillmor:
> Hi libpsl folks--
>
> i had a couple recent thoughts about libpsl in debian in conversation
> with pabs on IRC:
>
> a) I think libpsl should Recommend: publicsuffix. the normal
> installation should be to see that updated file. I'll probably go
> ahead and make that change to the debian packaging shortly.

Yes, as a step one (before the next release of libpsl).

I am still waiting for Darshit's .travis.yml patch (and he is waiting for
Travis to add libunistring package to their new CI environment). This should
happen in the next days.

> b) i think the main reason to not use an external file each time is the
> speed of parsing. I also notice that upstream is working with a
> DAWG/DAFSA approach borrowed from Chromium's analysis of gperf data.
>
> I could have the publicsuffix package ship both a plaintext version
> and a DAWG/DAFSA-compiled byte object, which libpsl could then mmap
> and work from. Then we don't need a builtin list at all, we can
> just Depends: publicsuffix (stronger than Recommends:) to keep the
> byte object up-to-date.

Very reasonable for Debian packaging. This would also solve the 'packaging
libpsl triggered by new version of publicsuffix' (we talked about that in the
past).

From my experience mmap() isn't faster than read() on current Linux (when
reading a whole file into memory). Since the data isn't very big either (~32kb
currently), read() is preferrable IMO. It has better compatibility than
mmap().

We'll need another function like psl_load_dafsa() or psl_load_file2() with an
additional filetype argument. Or do you think we could auto-detect the type of
file ? (the PSL text file begins with // ..., not sure if DAFSA could build
into such a string).

Should libpsl provide an amended make_dafsa.py to convert PSL text into DAFSA
? Right now it converts to C code (array).
Or should make_dafsa.py go into the publicsuffix/list project ?

BTW, right now, psl2c filters out plain TLDs, since they are covered by the
implicit '*' rule anyways. Just a few bytes less to care for.

Tim

Tim Ruehsen

unread,
Dec 29, 2015, 11:35:01 AM12/29/15
to libps...@googlegroups.com, Daniel Kahn Gillmor
I just added --binary to psl2c and to make_dafsa.py to generate binary DAFSA
files. Right now, psl2c is needed to convert the PSL into an intermediate
format which is than converted by make_dafsa.py into DAFSA (either binary or C
code).

@Daniel Not sure if that is what you need. To create DAFSA binary data, you'll
have to add the right flags to each PSL entry. You also have to convert each
8-bit domain/label into punycode. If you need a stand-alone make_dafsa.py (PSL
-> libpsl DAFSA binary) just give me a note.

Tim
Reply all
Reply to author
Forward
0 new messages