psl_registrable_domain limit to 8 domains is actually 9, plus performance suggestion

9 views
Skip to first unread message

davidt...@gmail.com

unread,
Mar 15, 2018, 6:01:56 PM3/15/18
to libpsl-bugs
The loop in both psl_registrable_domain() and psl_unregistrable_domain() does not break until entire domain is parsed or until 9 domains are found (not 8 as comment says).  This is off by one:
    ++nlabels > 8

Why is the limit intended to be 8 labels when there is nothing in the PSL list with more than 5 labels (AFAICT)?  Was this done to leave room for growth?

These routines call _psl_is_public_suffix() in a loop and it walks the passed domain counting labels and converting non-ASCII chars, in most cases multiple times for the same label.  Now that psl_registrable_domain() knows the number of labels, wouldn't it be better performance to do the ASCII conversion one time only in psl_registrable_domain() prior to its loop calling  _psl_is_public_suffix(), and then pass in the label count?

Finally, if the code calling PSL already has discerned the number of labels and that there are no non-ASCII chars, it would be nice for additional performance reasons if this information could be passed in to avoid the additional processing.

Thanks for this library! 

Tim Rühsen

unread,
Mar 19, 2018, 5:05:27 AM3/19/18
to davidt...@gmail.com, libpsl-bugs
Hi David,


On 03/15/2018 11:01 PM, davidt...@gmail.com wrote:
> The loop in both psl_registrable_domain() and psl_unregistrable_domain()
> does not break until entire domain is parsed or until 9 domains are found
> (not 8 as comment says). This is off by one:
> ++nlabels > 8

Mostly yes, but not if the domain ends with a dot.

> Why is the limit intended to be 8 labels when there is nothing in the PSL
> list with more than 5 labels (AFAICT)? Was this done to leave room for
> growth?

Yes exactly. 8/9 was just a random value big enough to cover possible
future extensions of the PSL. It's not foreseeable that this limit is
ever reached, but if so we change the code.

This change was basically to satisfy fuzzing. The fuzzer threw in a
domain with 10.000+ labels, ending up in a 25s timeout and then
generating an issue / bug report.
This is no real life test... anything below 100 labels should be fast
enough - but even 10+ labels in a domain are very rare.

> These routines call _psl_is_public_suffix() in a loop and it walks the
> passed domain counting labels and converting non-ASCII chars, in most cases
> multiple times for the same label. Now that psl_registrable_domain() knows
> the number of labels, wouldn't it be better performance to do the ASCII
> conversion one time only in psl_registrable_domain() prior to its loop
> calling _psl_is_public_suffix(), and then pass in the label count?
>
> Finally, if the code calling PSL already has discerned the number of labels
> and that there are no non-ASCII chars, it would be nice for additional
> performance reasons if this information could be passed in to avoid the
> additional processing.

I am not going to change the code since it works fast enough for real
life applications. If you have a use case where you'll need this type of
optimization, I'll be happy to accept your patch(es).

> Thanks for this library!

:-) You are welcome.

With Best Regards, Tim

signature.asc
Reply all
Reply to author
Forward
0 new messages