Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Adding knowledge of subdomain structure to Necko...

0 views
Skip to first unread message

Darin Fisher

unread,
Mar 24, 2006, 8:54:20 PM3/24/06
to dev-pl...@lists.mozilla.org
I want to draw attention to this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=331510

In order to solve a variety of bugs that require local knowledge of
subdomain structure, we're proposing that we encode a list of
subdomains in Necko (Mozilla's networking library). I understand
fully that subdomain structure evolves over time, and so this is not
the most ideal solution, but given Firefox's update system, I think
this solution could be workable.

Long term, I would much prefer to implement a solution such as:
http://www.ietf.org/internet-drafts/draft-pettersen-subtld-structure-00.txt

However, that draft is well... just a draft, and Yngve tells us that
it is very likely to change.

If you would like to comment on this proposal, please reply to this
message instead of posting to the bug. We'd like to minimize
discussion traffic in the bug, thanks! :-)

-Darin

Robert O'Callahan

unread,
Mar 26, 2006, 4:23:08 PM3/26/06
to

Oops, sorry.

It would be great to have a Necko interface for this ... if anyone gets
around to implementing WHATWG storage, they'll need it.

Rob

Darin Fisher

unread,
Mar 27, 2006, 9:55:23 AM3/27/06
to dev-pl...@lists.mozilla.org

Hey Rob,

I'm pretty sure that WHATWG storage avoids this whole messy issue by
not merging the data for hostnames with that of subdomains. At the
bottom of section 4.9.1 it says:

<quote>
Each domain and each subdomain has its own separate storage area.
Subdomains can access the storage areas of parent domains, and domains
can access the storage areas of subdomains.

* globalStorage[''] is accessible to all domains.
* globalStorage['com'] is accessible to all .com domains
* globalStorage['example.com'] is accessible to example.com and
any of its subdomains
* globalStorage['www.example.com'] is accessible to
www.example.com and example.com, but not www2.example.com.
</quote>

-Darin

Wladimir Palant

unread,
Mar 27, 2006, 10:09:14 AM3/27/06
to
Darin Fisher wrote:
> I'm pretty sure that WHATWG storage avoids this whole messy issue by
> not merging the data for hostnames with that of subdomains. At the
> bottom of section 4.9.1 it says:
>
> <quote>
> Each domain and each subdomain has its own separate storage area.
> Subdomains can access the storage areas of parent domains, and domains
> can access the storage areas of subdomains.
>
> * globalStorage[''] is accessible to all domains.
> * globalStorage['com'] is accessible to all .com domains
> * globalStorage['example.com'] is accessible to example.com and
> any of its subdomains
> * globalStorage['www.example.com'] is accessible to
> www.example.com and example.com, but not www2.example.com.
> </quote>

I hope it will never be implemented in this form - that's all the
privacy concerns people have with cookies made even worse. The
suggestion from 4.9.7.1 is a MUST:

<quote>
Blocking access to the top-level domain ("public") storage areas: user
agents may prevent domains from storing data in and reading data from
the top-level domain entries in the globalStorage object.
</quote>

And then you will need to recognize TLDs.

Wladimir

Benjamin Smedberg

unread,
Mar 27, 2006, 10:19:39 AM3/27/06
to
Darin Fisher wrote:

> In order to solve a variety of bugs that require local knowledge of
> subdomain structure, we're proposing that we encode a list of
> subdomains in Necko (Mozilla's networking library). I understand

How will this affect corporate networks with custom nameservers? I know of
at least one large organization which uses <machine>.<unit>.<privateTLDcode>
to identify machines/services within the organization. Would this be
affected by necko's understanding of subdomains?

--BDS

Darin Fisher

unread,
Mar 27, 2006, 12:23:28 PM3/27/06
to dev-pl...@lists.mozilla.org

We would probably want to provide corporations with a way to extend
necko's knowledge of subdomains, and in conjunction with that, we'd
need to define sane behavior for when we do not know about the
substructure of a TLD. My suggestion would be to interpret the last
node of the hostname as the TLD (when the last node is unrecognized)
for grouping purposes.

-Darin

Darin Fisher

unread,
Mar 27, 2006, 12:30:13 PM3/27/06
to Wladimir Palant, dev-pl...@lists.mozilla.org
On 3/27/06, Wladimir Palant <tr...@gtchat.de> wrote:
> Darin Fisher wrote:
> > I'm pretty sure that WHATWG storage avoids this whole messy issue by
> > not merging the data for hostnames with that of subdomains. At the
> > bottom of section 4.9.1 it says:
> >
> > <quote>
> > Each domain and each subdomain has its own separate storage area.
> > Subdomains can access the storage areas of parent domains, and domains
> > can access the storage areas of subdomains.
> >
> > * globalStorage[''] is accessible to all domains.
> > * globalStorage['com'] is accessible to all .com domains
> > * globalStorage['example.com'] is accessible to example.com and
> > any of its subdomains
> > * globalStorage['www.example.com'] is accessible to
> > www.example.com and example.com, but not www2.example.com.
> > </quote>
>
> I hope it will never be implemented in this form - that's all the
> privacy concerns people have with cookies made even worse.

Please explain.


> The
> suggestion from 4.9.7.1 is a MUST:

I read that section, and I don't see the word "MUST" anywhere. The
snippet you quoted says "may" in fact.

I agree that we could use knowledge of subdomain structure to
implement such a restriction for whatwg storage.

-Darin

Wladimir Palant

unread,
Mar 27, 2006, 12:55:01 PM3/27/06
to
Darin Fisher wrote:
>> I hope it will never be implemented in this form - that's all the
>> privacy concerns people have with cookies made even worse.
>
> Please explain.

It allows to store data for an empty domain so that every site on the
internet can read it - perfect for user tracking. Blocking access from
third-party scripts isn't a real solution, scripts can be proxied by the
server that includes them. The suggestion to limit lifetime of the
storage data is better, but still a partial solution. Which leaves us
with the third suggestion - don't allow data to be stored for TLDs and
above (probably with a way to define exceptions). That's why I say it is
a "must" - and I fail to see the reason why the specification doesn't.

Wladimir

Darin Fisher

unread,
Mar 27, 2006, 1:10:58 PM3/27/06
to Wladimir Palant, dev-pl...@lists.mozilla.org
On 3/27/06, Wladimir Palant <tr...@gtchat.de> wrote:

It sounds to me as though you should take this up with the WhatWG.
They have an open discussion forum (http://whatwg.org/mailing-list)
where this issue would be best discussed. I'm sure that there must be
folks on that list who will be able to defend the current design.

IMO, that sort of user tracking is already possible by including an
<img> tag or something similar from pages that will inform a tracking
site when the page is loaded.

-Darin

Wladimir Palant

unread,
Mar 27, 2006, 1:21:28 PM3/27/06
to
Darin Fisher wrote:
> It sounds to me as though you should take this up with the WhatWG.

Yes, probably. I'm sure somebody brought this issue up already, so I
will dig through their archives when I have a little time.

> IMO, that sort of user tracking is already possible by including an
> <img> tag or something similar from pages that will inform a tracking
> site when the page is loaded.

AFAICT nowadays no major browser allows web bugs to set cookies (I think
IE6 only got this fixed with SP2). You can track users by IP/user agent
(this can't be prevented) but you can't be sure that you are really
tracking the same user. Global storage would be a step backwards unless
I misunderstand something.

Wladimir

Darin Fisher

unread,
Mar 27, 2006, 1:38:06 PM3/27/06
to Wladimir Palant, dev-pl...@lists.mozilla.org
On 3/27/06, Wladimir Palant <tr...@gtchat.de> wrote:

Any site that wishes to participate in such a tracking mechanism can
contact the tracking site for a user ID, and then they can set the
user ID as a cookie on their domain. Then in the future they can send
that user ID to the tracking site. There's no requirement that the
user ID be stored only once for all domains on the browser in order to
enable this sort of tracking. Yes, it would make it easier to
implement, but only a little bit easier. It wouldn't matter much to
someone who was determined to participate in a cross-site user
tracking system.

-Darin

Robert O'Callahan

unread,
Mar 27, 2006, 4:30:19 PM3/27/06
to Darin Fisher, dev-pl...@lists.mozilla.org
WHATWG Storage requires TLD structure knowledge to implement reasonable
quota policies. You want to restrict evil.com from consuming unlimited
quota by storing data under a.evil.com, b.evil.com and so on. A
reasonable way to do that is to charge all storage under *.evil.com to
evil.com's quota. But you've got to stop at public TLDs, or at least
treat them differently. We don't want all domains under *.co.nz to get
the same amount of quota, together, as evil.com gets for itself.

Rob

Robert O'Callahan

unread,
Mar 27, 2006, 4:32:22 PM3/27/06
to Darin Fisher, dev-pl...@lists.mozilla.org

Darin Fisher

unread,
Mar 27, 2006, 4:12:55 PM3/27/06
to Robert O'Callahan, dev-pl...@lists.mozilla.org

Good point!

-Darin

Pam Greene

unread,
Mar 29, 2006, 4:29:01 PM3/29/06
to dev-pl...@lists.mozilla.org
On 3/26/06, Robert O'Callahan <rob...@ocallahan.org> wrote:
Darin Fisher wrote:
> I want to draw attention to this bug:
> https://bugzilla.mozilla.org/show_bug.cgi?id=331510
>
> In order to solve a variety of bugs that require local knowledge of
> subdomain structure, we're proposing that we encode a list of
> subdomains in Necko (Mozilla's networking library).  I understand
> fully that subdomain structure evolves over time, and so this is not
> the most ideal solution, but given Firefox's update system, I think
> this solution could be workable.
>
> Long term, I would much prefer to implement a solution such as:
> http://www.ietf.org/internet-drafts/draft-pettersen-subtld-structure-00.txt
>
> However, that draft is well... just a draft, and Yngve tells us that
> it is very likely to change.
>
> If you would like to comment on this proposal, please reply to this
> message instead of posting to the bug.  We'd like to minimize
> discussion traffic in the bug, thanks! :-)

Okay, so... leaving WhatWG for the moment and bringing this back to the subdomain service:

Discussion in the bug has so far concluded that for the moment, we want to maintain a domain list at mozilla.org that browsers periodically download separate from the auto-update system (because distributors might have overridden that).  If at some point in the future someone else starts maintaining a file we want to use instead, we can always redirect there.  (The chances of their file matching the format we've used are pretty slim, though, so more likely we'd have to set up a separate process to build our own file from the other one, at least for a while.)

Which leads to the next question.  Anybody have strong opinions about what our file should look like?  Yngve's proposed format is nicely general, but it's fairly arcane, and may be difficult for someone who doesn't think in a formal grammar to create or maintain.  It's also subject to change, so it's not obvioius that we should use that for forward compatibility anyway.  A plain text list of known domains has the opposite problem: creation and parsing are nearly trivial, but it's not as flexible.  Where should we aim?

- Pam

Darin Fisher

unread,
Mar 29, 2006, 8:45:13 PM3/29/06
to Pam Greene, dev-pl...@lists.mozilla.org
On 3/29/06, Pam Greene <pamg...@gmail.com> wrote:
> On 3/26/06, Robert O'Callahan <rob...@ocallahan.org> wrote:
> > Darin Fisher wrote:
> > > I want to draw attention to this bug:
> > > https://bugzilla.mozilla.org/show_bug.cgi?id=331510
> > >
> > > In order to solve a variety of bugs that require local knowledge of
> > > subdomain structure, we're proposing that we encode a list of
> > > subdomains in Necko (Mozilla's networking library). I understand
> > > fully that subdomain structure evolves over time, and so this is not
> > > the most ideal solution, but given Firefox's update system, I think
> > > this solution could be workable.
> > >
> > > Long term, I would much prefer to implement a solution such as:
> > >
> http://www.ietf.org/internet-drafts/draft-pettersen-subtld-structure-00.txt
> > >
> > > However, that draft is well... just a draft, and Yngve tells us that
> > > it is very likely to change.
> > >
> > > If you would like to comment on this proposal, please reply to this
> > > message instead of posting to the bug. We'd like to minimize
> > > discussion traffic in the bug, thanks! :-)
> >
>
> Okay, so... leaving WhatWG for the moment and bringing this back to the
> subdomain service:
>
> Discussion in the bug has so far concluded that for the moment, we want to
> maintain a domain list at mozilla.org that browsers periodically download
> separate from the auto-update system (because distributors might have
> overridden that). If at some point in the future someone else starts
> maintaining a file we want to use instead, we can always redirect there.
> (The chances of their file matching the format we've used are pretty slim,
> though, so more likely we'd have to set up a separate process to build our
> own file from the other one, at least for a while.)
>
> Which leads to the next question. Anybody have strong opinions about what
> our file should look like? Yngve's proposed format is nicely general, but
> it's fairly arcane, and may be difficult for someone who doesn't think in a
> formal grammar to create or maintain. It's also subject to change, so it's
> not obvioius that we should use that for forward compatibility anyway. A
> plain text list of known domains has the opposite problem: creation and
> parsing are nearly trivial, but it's not as flexible. Where should we aim?
>
> - Pam

I think we first need to figure out what the format should express,
and then it should be fairly straightforward to select a reasonable
file format.

-Darin

Pam Greene

unread,
Mar 31, 2006, 5:16:49 PM3/31/06
to Darin Fisher, dev-pl...@lists.mozilla.org
Well, Yngve's format implicitly describes some of the issues in the course of addressing them:

Many top-level domains contain sub-domains that effectively funciton as TLDs, in the sense that many independent domains ( i.e., somebody's site) exist as immediate children of those sub-domains. The previous point is true recursively: some level-2 domains contain level-3 domains that effectively function as TLDs, and so on. However, the hierarchy is not strict.  A given level might have some children that are independent entities and some that are effectively TLDs; sometimes those are expressed as a list of included subdomains, sometimes as a list of excluded ones. And of course, there's no guarantee that things will stay even as minimally self-consistent as they are now.

To effectively handle the possibilities, and especially the complexity of .jp (see https://bugzilla.mozilla.org/show_bug.cgi?id=252342#c31), I'd suggest this:

- The longest (i.e., highest-level) matching item in the list is considered the hostname's "effective TLD".
- An asterisk * matches any valid sequence of characters.
- If nothing in the list matches a hostname, the last dot-suffix is used as a default.
- An exclamation mark ! indicates an exception to a wildcarded rule, where that sub-domain should be used instead.
- A line is only considered up to the first whitespace, leaving the rest for comments.

So for example, using the classifications in http://wiki.mozilla.org/TLD_List,

com  # Type A: not needed, but included for completeness
*.uk   # Type B
be     # Type C: base case not needed, but included for completeness
ac.be
jp      # Type D: complicated
ac.jp
...
*.hokkaido.jp  # hosts in .hokkaido.jp can't set cookies below level 4...
*.tokyo.jp
...
!metro.tokyo.jp
!pref.hokkaido.jp  # ...except hosts in pref.hokkaido.jp, which can set level 3
...
!city.shizuoka.jp

But I doubt I'll end up maintaining the file, so I'd like to hear from whoever might be.  Does this format look easy enough to understand and keep up to date?  And for everybody else, does it look like it'll cover all the cases?

- Pam

On 3/29/06, Darin Fisher <dar...@gmail.com> wrote:
On 3/29/06, Pam Greene <pamg...@gmail.com> wrote:
> On 3/26/06, Robert O'Callahan <rob...@ocallahan.org> wrote:
> > Darin Fisher wrote:
> > > I want to draw attention to this bug:
> > > https://bugzilla.mozilla.org/show_bug.cgi?id=331510
> > >
> > > In order to solve a variety of bugs that require local knowledge of
> > > subdomain structure, we're proposing that we encode a list of
> > > subdomains in Necko (Mozilla's networking library).  I understand
> > > fully that subdomain structure evolves over time, and so this is not
> > > the most ideal solution, but given Firefox's update system, I think
> > > this solution could be workable.
> > >
> > > Long term, I would much prefer to implement a solution such as:
> > >
> http://www.ietf.org/internet-drafts/draft-pettersen-subtld-structure-00.txt
> > >
> > > However, that draft is well... just a draft, and Yngve tells us that
> > > it is very likely to change.
> > >
> > > If you would like to comment on this proposal, please reply to this
> > > message instead of posting to the bug.  We'd like to minimize
> > > discussion traffic in the bug, thanks! :-)
> >
>

Philip Chee

unread,
Apr 1, 2006, 5:11:00 AM4/1/06
to
On Fri, 31 Mar 2006 16:17:05 -0600, Pam Greene wrote:

> *.uk # Type B

Doesn't the UK have several second level domains?
e.g. <http://www.parliament.uk/>

Phil
--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.
[ ]Behind every succesfull man is woman with nothing to wear
* TagZilla 0.059

0 new messages