On Sat, Oct 13, 2018 at 10:34:25PM +0200, Tim Rühsen <
tim.r...@gmx.de> wrote:
> * DAFSA performance is lower due to it's compressed format, see
So it's a classical memory/speed trade-off, that's fine with me, basing
my applications on the text list when performance is needed makes things
much simpler anyway, so win-win for me.
My issue was merely that this was not mentioned anywhere, it seems, and would
be highly useful to know :)
> I won't do anything here since browsers and other web clients normally
> do not need more then 10-20 lookups per second. If you have a special
> application that continuously needs 1M lookups / s or more, please
I figured as much. I also don't need 1M lookups/s, but even if I only need
100k/s, then libpsl already takes 20% of my available time budget if it
does 1M/s, for something that probably isn't the main job of my app (which
is dns analysis and involves hahs lookups, dns decoding and a lot more).
> optimize the code as needed and send us a patch. Or let us know the
> details + test data...
As mentioned, if I ever need it, I would base my solution on libhyperscan.
The fact that the simplified syntax seems to be the official syntax will
majorly help here. This solution would probably not be applicable to
psl, though, as libhyperscan is another (big-ish, maybe less portable)
dependency, and there is a place in the world for a libpsl that isn't
hyperfast but compact and easy to use...
> * creating a fast tool for DAFSA encoding would be nice and has been
> discussed. I simply don't have the time to work on it. Such a tool would
> also be relevant for libhsts DAFSA encoding.
I think the issue at hand is not the speed, but ease of use - not having to
run an external python interpreter (and having it on your system in the first
place) from your C application would help *iff* you need DAFSA.
Now, since DAFSA is actually slower (on purpose), I think this is a
non-issue, since the only real reason to use it is when you have some kind
of distribution scheme, in which case the dependency probably isn't an
issue: every systems where you can have python as a dependency should be
fine with having the parsed text list in memory instead.
It's certainly not an issue for me at all. It would be more of an issue if
dafsa gave me speed improvements, in which case it would be tempting to
use it :)
> * "full psl list syntax" with any kind of wildcard/star has been
> discussed upstream and has been dropped.
Ah, upstream is mozilla? That is great news, because I can't imagine a
use case where multiple non-contiguous wildcards would be used, and it
of course greatly complicates both implementations and their API. I hope
mozilla updates their spec, so that people don't implement it in vain.
> But if you see a real life feature is missing, let us know (best is to
> open an issue at Github.)
As far as I analyzed the public suffix list, there are no rules with more
than one wildcard, and it is always at the beginning, so I think the
current psl should work with libpsl.
However, if this is the case, it again would be great if this were
documented, so users of psl_(un)registrable_domain would know that the
"registrable" domain part is always exactly one label longer than the
unregistrable domain part.
Thinking about it, this hasn't anything to do with wildcards at all, it's
basically how psl defines registrable domain.
Still, would be nice if it were spelled out explicitly.
> * "The libpsl source code uses a very large number of C reserved
> identifiers". The _PSL_FLAG_* defines or is there anything more
Anything starting with _ is a reserved identifier in C (e.g. _utf8_to_utf32,
_psl_idna_t, _vector_get...).
I haven't run into problems yet, but I will only ever use it on GNU/Linux
systems.
If you wait long enough and have enough platforms, it is, in my experience,
just a matter of time until you get clashes, especially with generic names
such as _vector_alloc or _utf8_to_utf32.
Given that the cost of _not_ using reserved identifiers is trivial, my
_personal_ choice would always be to not bother with them, but I don't have
to maintain libpsl (and I hope even if upstream goes away, debian will
probably maintain it a bit more for me, so it isn't an isue for me).
Anyways, thanks again, for providing and maintaining libpsl, it
probably saved me a few days of implementation and debugging, and, more
importantly, removes one more thing I have to maintain.