On Tue 2016-07-12 15:36:24 +0200, Tim Ruehsen wrote:
>> [0]
https://publicsuffix.org/list/
>
> see 'Divisions'
> "The Public Suffix List is subdivided, using markers in the comments, into two
> sections, labelled as ICANN domains and PRIVATE domains."
hm, i agree that is there, but there's no explicit specification for
what the comment actually is -- they could change the strings or reverse
the order of the domains without violating that clause, right? (your
linter would complain though).
maybe the spec needs updating to say explicitly how the sections are
structured, and should guarantee something about how the PSL list is
itself introduced? If it did that then we could just use that part as
the "magic string", right?
yep, i see that stuff, and i understand the reason that the two parts of
the list exist. There are even more reasons to use the list, though
(see the discussion around the possible uses of DBOUND). In the
abstract, i think what they're specifying are two distinct PSLs:
* a PSL for cookies (ICANN + PRIVATE)
* a PSL for X.509 wildcard issuance (ICANN)
it's entirely possible that the PSL for, say, DMARC would be distinct
From (though partially overlapping with) either of these.
If you're loading a PSL for a specific purpose, it'd be great to just
load it and get boolean answers back :)
> [ dkg wrote: ]
>> I'd much rather use a magic word prefix for the relatively-unused DAFSA
>> format than change our expectations for an already widely-deployed file
>> format. It would also let us unequivocally distinguish even the most
>> perversely-generated DAFSA from a document that meets the PSL
>> specification, just by starting with a line that is not a valid domain
>> name should do the trick. and we get fgets for free as long as we
>> terminate the magic word with a newline.
>
> What exactly do you propose ?
> There should be at least some version information, just in case the format
> evolves.
how about this leading string:
'.DAFSA@PSL_0 \n'
in hex: 2e 44 41 46 53 41 40 50 53 4c 5f 30 20 20 20 0a
This has several advantages:
* the leading . explicitly violates the spec for the non-DAFSA file:
"Each rule lists a public suffix, with the subdomain portions
separated by dots (.) as usual. There is no leading dot."
* it contains both @ and _, characters traditionally not allowed in
host labels, and very unlikely to be present in the suffix of a
public registry despite more recent loosening of generic DNS label
constraints.
* it is newline-terminated, so it is a natural thing to parse when
using fgets().
* it is exactly 16 octets, so mmapping the file on aligned
architectures should leave the start of the DAFSA itself well-aligned
with typical memory architectures up to 128-bits, if alignment ever
matters to the layout.
* it has (after the _) a version number, so that it is potentially
extendable.
* it distinguishes both the data structure (DAFSA) and the intent for
the data structure (PSL), so that such a structure won't be
accidentally mis-used for other purposes.
wdyt?
--dkg