constness of psl_builtin() and other possible API enhancements

18 views
Skip to first unread message

Daniel Kahn Gillmor

unread,
Aug 11, 2015, 11:21:43 PM8/11/15
to libpsl development
hi there--

the only function that uses a non-const psl_ctx_t* is psl_free. It
knows to avoid trying to actually free() the builtin psl ctx.

I'm wondering if we shouldn't just return a regular (non-const)
psl_ctx_t* from psl_builtin -- it would make it easier to initialize a
variable and then forget about how the caller initialized it -- just
always free the thing.

Alternately, i was considering a new function psl_builtin_or_update()
which would look for the builtin filename in the current filesystem,
compare its mtime with the builtin mtime, and either build a new
psl_ctx_t and return it, or return the psl_builtin. this would need to
be a non-const return value, i think.

another couple thoughts:

--------------
what if passing NULL to any of the functions that require a psl_ctx_t
would automatically use the psl_builtin?

that would turn:

psl_is_public_suffix(psl_builtin(), domain);

into:

psl_is_public_suffix(NULL, domain);
--------------

Also: what if we had a non-threadsafe function like the proposed
psl_builtin_or_update() that initialized a single new builtin psl object
at runtime on the basis of the local filesystem copy of the builtin file
being updated?

--------------

I'm just thinking here trying to streamline the use case of the common
user who just wants to use the latest system copy, if we can't keep the
the C library updated with the same frequency as the baseline package.

I'm not sure about the tradeoffs for runtime efficiency
vs. packaging/distribution efficiency.

any thoughts?

--dkg

Tim Ruehsen

unread,
Aug 12, 2015, 6:22:49 AM8/12/15
to libps...@googlegroups.com, Daniel Kahn Gillmor
On Tuesday 11 August 2015 23:21:31 Daniel Kahn Gillmor wrote:
> hi there--
>
> the only function that uses a non-const psl_ctx_t* is psl_free. It
> knows to avoid trying to actually free() the builtin psl ctx.
>
> I'm wondering if we shouldn't just return a regular (non-const)
> psl_ctx_t* from psl_builtin -- it would make it easier to initialize a
> variable and then forget about how the caller initialized it -- just
> always free the thing.

As you write above, just fire psl_free() and forget.
If you want to avoid a compiler warning you'll have to use a cast like
psl_free((psl_ctx_t *)psl).

You can argue back and forth if to use const or not , in the end it is a
matter of taste (in this special case).

But just think of future extensions like e.g. psl_add_rule(psl_ctx_t *psl,
const char *rule). You simply can't do that with a builtin psl context. If we
change psl_builtin() to return non-const, you won't get a compiler
warning/error when calling psl_add_rule() - you'll see the problem only at
runtime. Staying with const allows the compiler to detect the problem at
compile time, which I prefer.
This is maybe not just a theoretical case, as I heard that Google uses their
own 'PSL' list builtin into Chrome/Chromium. We could have a function to merge
Mozillas PSL and Google's !?

> Alternately, i was considering a new function psl_builtin_or_update()
> which would look for the builtin filename in the current filesystem,
> compare its mtime with the builtin mtime, and either build a new
> psl_ctx_t and return it, or return the psl_builtin. this would need to
> be a non-const return value, i think.

Having builtin data requires regular recompilation of libpsl (whenever the PSL
becomes updated). That's how we designed it from the beginning.
Having the PSL data as git submodule since 0.8.0 makes it very easy to do
exactly this automatically.

AFAIR, for Debian there is no rule like 'if package x becomes rebuilt, also
rebuild package y'. If there is, libpsl could be automatically rebuilt each
time the package publicsuffix changes. Did you start a request for such ? (I
wouldn't even know where to ask.)
Maybe we could merge the publicsuffix and libpsl into one source package ?

To avoid the builtin / non-built conflict for a function like
psl_builtin_or_update(), we could instead have something like

if (psl_builtin_outdated())
psl = psl_load_file(psl_builtin_filename());

// fallback if something goes wrong
if (!psl)
psl = psl_builtin();


> another couple thoughts:
>
> --------------
> what if passing NULL to any of the functions that require a psl_ctx_t
> would automatically use the psl_builtin?
>
> that would turn:
>
> psl_is_public_suffix(psl_builtin(), domain);
>
> into:
>
> psl_is_public_suffix(NULL, domain);

Basically a good idea... but i have to check if builtin data exists before I
call psl_is_public_suffix(), else I possibly get a misleading 0 as return
value.
These kind of simplifications tend to provoke erroneous code.

> --------------
>
> Also: what if we had a non-threadsafe function like the proposed
> psl_builtin_or_update() that initialized a single new builtin psl object
> at runtime on the basis of the local filesystem copy of the builtin file
> being updated?

It wouldn't create or return a new builtin psl object. It either returns the
builtin psl or a new dynamically created. That makes such a function thread-
safe.

> --------------
>
> I'm just thinking here trying to streamline the use case of the common
> user who just wants to use the latest system copy, if we can't keep the
> the C library updated with the same frequency as the baseline package.
>
> I'm not sure about the tradeoffs for runtime efficiency
> vs. packaging/distribution efficiency.
>
> any thoughts?

Thanks for your ideas.
I also think, something must be changed to pass the frequent PSL changes to
applications.

Aggregated from above

1.
if (psl_builtin_outdated())
psl = psl_load_file(psl_builtin_filename());
We could add some code examples and/or 'best practice' into the docs.

2.
Maybe we could merge the publicsuffix and libpsl into one source package ?
That's for Debian only, I guess. That of course only makes sense if the PSL
data is not updated too often (once a week seems ok, daily updates makes it
inefficient).

3. (new point)
Just in theory. A web service, like e.g. OCSP responders (for certificate
revocation checking). :-) Maybe the DNS system would be able to give such
information ?

Best Regards, Tim

Daniel Kahn Gillmor

unread,
Aug 12, 2015, 9:36:36 AM8/12/15
to Tim Ruehsen, libps...@googlegroups.com
On Wed 2015-08-12 06:22:47 -0400, Tim Ruehsen wrote:
> As you write above, just fire psl_free() and forget.
> If you want to avoid a compiler warning you'll have to use a cast like
> psl_free((psl_ctx_t *)psl).
>
> You can argue back and forth if to use const or not , in the end it is a
> matter of taste (in this special case).
>
> But just think of future extensions like e.g. psl_add_rule(psl_ctx_t *psl,
> const char *rule). You simply can't do that with a builtin psl
> context.

This convinces me that my proposal was a bad idea :)

> Having builtin data requires regular recompilation of libpsl (whenever
> the PSL becomes updated). That's how we designed it from the
> beginning. Having the PSL data as git submodule since 0.8.0 makes it
> very easy to do exactly this automatically.
>
> AFAIR, for Debian there is no rule like 'if package x becomes rebuilt,
> also rebuild package y'. If there is, libpsl could be automatically
> rebuilt each time the package publicsuffix changes. Did you start a
> request for such ? (I wouldn't even know where to ask.) Maybe we
> could merge the publicsuffix and libpsl into one source package ?

i've thought about that, but i'm not sure what the best way to do this
is. I'll talk to folks at debconf next week and see if there's any good
approach we could do.

Bundling publicsuffix and libpsl into one source package would mean
needing to wait for the buildd's just to get the updated data out.

i like the idea of trying to trigger a binary rebuild of libpsl every
time publicsuffix is updated for a given distro, though, that seems
pretty clean.

> To avoid the builtin / non-built conflict for a function like
> psl_builtin_or_update(), we could instead have something like
>
> if (psl_builtin_outdated())
> psl = psl_load_file(psl_builtin_filename());
>
> // fallback if something goes wrong
> if (!psl)
> psl = psl_builtin();

that seems not too bad -- but what does psl_builtin_outdated() return if
there was no data built in?

>> another couple thoughts:
>>
>> --------------
>> what if passing NULL to any of the functions that require a psl_ctx_t
>> would automatically use the psl_builtin?
>>
>> that would turn:
>>
>> psl_is_public_suffix(psl_builtin(), domain);
>>
>> into:
>>
>> psl_is_public_suffix(NULL, domain);
>
> Basically a good idea... but i have to check if builtin data exists before I
> call psl_is_public_suffix(), else I possibly get a misleading 0 as return
> value.
> These kind of simplifications tend to provoke erroneous code.

yep, agree with you here. i retract this suggestion.

>> --------------
>>
>> Also: what if we had a non-threadsafe function like the proposed
>> psl_builtin_or_update() that initialized a single new builtin psl object
>> at runtime on the basis of the local filesystem copy of the builtin file
>> being updated?
>
> It wouldn't create or return a new builtin psl object. It either returns the
> builtin psl or a new dynamically created. That makes such a function thread-
> safe.

would it return the thing const or non-const?

> 3. (new point)
> Just in theory. A web service, like e.g. OCSP responders (for certificate
> revocation checking). :-) Maybe the DNS system would be able to give such
> information ?

This is the goal of the dbound working group in the IETF. a couple
weeks ago we met and there was (i thought) a sense that we might be
converging on a set of ideas to clarify the problem that we're trying to
solve. it's been slow going, but if specific proposals come up, i'll
try to think about how to map them to the libpsl API if we wanted to
make a special psl_ctx_t that talked to the DNS or a web service
directly.

Thanks for thinking this through with me. sorry that several of my
late-night ideas were duds ;)

--dkg

Tim Ruehsen

unread,
Aug 12, 2015, 10:51:17 AM8/12/15
to Daniel Kahn Gillmor, libps...@googlegroups.com
On Wednesday 12 August 2015 09:36:24 Daniel Kahn Gillmor wrote:
> > To avoid the builtin / non-built conflict for a function like
> > psl_builtin_or_update(), we could instead have something like
> >
> > if (psl_builtin_outdated())
> >
> > psl = psl_load_file(psl_builtin_filename());
> >
> > // fallback if something goes wrong
> > if (!psl)
> >
> > psl = psl_builtin();
>
> that seems not too bad -- but what does psl_builtin_outdated() return if
> there was no data built in?

psl_builtin_outdated() returns 1 in this case to trigger dynamic loading.
psl_builtin_filename() returns NULL in this case, so does
psl_load_file(psl_builtin_filename()) -> the fallback code is triggered.

What is missing above is the check for psl_builtin() returning NULL, which
should trigger some action in the application. E.g. Wget would fall back to
the old cookie checking code. Same that is being used when libpsl is not
compiled/linked into Wget.

Maybe you have a better idea (maybe just regarding the name).
I would like to use in Wget soon.

> >> --------------
> >>
> >> Also: what if we had a non-threadsafe function like the proposed
> >> psl_builtin_or_update() that initialized a single new builtin psl object
> >> at runtime on the basis of the local filesystem copy of the builtin file
> >> being updated?
> >
> > It wouldn't create or return a new builtin psl object. It either returns
> > the builtin psl or a new dynamically created. That makes such a function
> > thread- safe.
>
> would it return the thing const or non-const?

:-) Let's try to avoid such function and have something like
psl_builtin_outdated().

> > 3. (new point)
> > Just in theory. A web service, like e.g. OCSP responders (for certificate
> > revocation checking). :-) Maybe the DNS system would be able to give such
> > information ?
>
> This is the goal of the dbound working group in the IETF. a couple
> weeks ago we met and there was (i thought) a sense that we might be
> converging on a set of ideas to clarify the problem that we're trying to
> solve.

*lol* Your writing is so funny. I should put it into my citation list :-)

> Thanks for thinking this through with me. sorry that several of my
> late-night ideas were duds ;)

Ideas worth a discussion. Now it's documented on the mailing list for later
reference. That's not too bad.

Regards, Tim

Reply all
Reply to author
Forward
0 new messages