Extending XHTML's #fragment-id for multiple-ids-in-one

Edward Lee

unread,

May 8, 2007, 11:02:45 PM5/8/07

to

XHTML currently uses #fragment-id to reference a single location within
an XHTML document, and this only uses a small fraction of all possible
values for a #fragment-id. By using an unreserved character, multiple
"sub"-#fragment-ids can be separated within a whole #fragment-id. This
would maintain backwards compatibility while allowing new functionality
to be added later.

Comments about this idea would be appreciated. More details below.

--------------------

The URI General Syntax [1] allows an optional fragment identifier at the
end of the URI, and Section 4.1 states that the semantics of the
fragment identifier depends on the MIME type of the retrieved data.

Currently XHTML documents use the fragment identifier to jump to the
element that has the matching "id" attribute. This is different from
HTML 4 which allows for fragment identifiers to match "name" attributes.
XHTML 1.0 [2] suggests only using [A-Za-z][A-Za-z0-9:_.-]* for
compatibility.

The URI General Syntax allows for a lot more characters in a fragment
identifier than what is needed for an XHTML id - including the character
group "mark" [3]: "-_.!~*'()" except "-_." (valid id characters). These
characters do not need to be escaped, so using a character like "!"
won't result in unnecessary %hexhex escaping.

Assuming web browsers are non-greedy*, XHTML documents that use a "!" in
the fragment identifier would treat the characters before the "!" as the
id attribute to find. Newer browsers that do support separating
identifiers with "!" in the fragment identifier would correctly break it
into the multiple pieces.

With multiple identifiers, XHTML documents could then be extended to
allow multiple elements to be targeted so if the first "id" doesn't
exist, the second can be tried. Alternatively, multiple "id"s could
provide a sequence of elements to jump to. (e.g., #s1!s2!s4 would first
jump to an element whose id is s1 and when the user goes "next" it'll
jump to s2.)

Multiple identifiers would also allow Link Fingerprints [4] to work on
pages that also want to reference content within the page. Pretending
the Link Fingerprint identifier is "(md5)1234CAFE", a full fragment
identifier that references an id "jump_to_here" would be
"#jump_to_here!(md5)1234CAFE". Current browsers would jump to
jump_to_here and ignore the invalid-id-character "!" while newer
browsers would process the Link Fingerprint as well as jump to the element.

Ed

[1] http://www.ietf.org/rfc/rfc2396.txt (section 4.1)
[2] http://www.w3.org/TR/xhtml1/#C_8
[3] http://www.ietf.org/rfc/rfc2396.txt (section 2.3)
[4] http://www.gerv.net/security/link-fingerprints/

* It seems like most (all?) browsers take the whole fragment-id as the
"id" instead of only the characters that are valid for an "id." E.g.,
Firefox...
const nsACString& sNewRef = Substring(refStart, refEnd);
http://mxr.mozilla.org/mozilla/source/docshell/base/nsDocShell.cpp#7246

Axel Hecht

unread,

May 9, 2007, 8:10:55 AM5/9/07

to

Three points of comment:

- .firefox is likely not the appropriate group for this, not that I'm in
the need for one right now.

- Your * comment basically turns down your proposal itself, AFAICT

- You should look at xpointer. It might just do what you want, even the
fallback part.
https://bugzilla.mozilla.org/attachment.cgi?id=119333&action=view#xpath1(id("notso")|id("id3"))
for example.

http://developer.mozilla.org/en/docs/XML_in_Mozilla#XML_Linking_and_Pointing

Axel

Edward Lee

unread,

May 9, 2007, 10:05:50 AM5/9/07

to

Axel Hecht wrote:
> - Your * comment basically turns down your proposal itself, AFAICT

Well, it turns down the backwards compatibility in practice, but is that
much of a problem? The existing browsers lose the ability to jump to an
"id" but it'll still let the newer browsers to get the multiple parts.

> - .firefox is likely not the appropriate group for this

There would be a need to change the app as pointed out in the * comment,
so I was looking to see if I'm on the right track. To prepare for the
"backwards compatibility", the only code that should need to change is
stopping at the first invalid character instead of grabbing everything. ?

> - You should look at xpointer.

Oh neat. I was going to mention "xpath bookmarks" [1], but I wasn't sure
if there was something else that does it. Thanks for the pointer. ;) [2]

Ed

[1] http://ecmanaut.blogspot.com/2007/04/xpath-bookmarks.html
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=32832

Edward Lee

unread,

May 9, 2007, 10:26:06 AM5/9/07

to

Axel Hecht wrote:
> - You should look at xpointer.

The XPointer Framework page [1] gives an example of a URI-escaped
XPointer...

#xpointer(string-range(//P,%22my%20favorite%20smiley%20:-%5E)%22))

So it seems like commas are okay to use even if they're reserved.. as
long as it's part of the spec and not some data. ?

Also, it references IPv6 address in URLs [2] which adds the [square
brackets] to the reserved character set. Perhaps these combined could
make "nicer" URLs..

...#jump_here,jump_there,[md5!1234CAFE]

Not too different from what I suggested earlier with (md5)... As long as
something looking at the URI can easily tell which things definitely
aren't "jump-to-id" identifiers. (Start it with a character that can't
be in an "id.")

Ed

[1] http://www.w3.org/TR/xptr-framework/#escExamples
[2] http://www.rfc-editor.org/rfc/rfc2732.txt (section 3.3)

Gervase Markham

unread,

May 15, 2007, 7:30:12 AM5/15/07

to

Edward Lee wrote:
> I'm still not too sure about what exactly we'll want to request in a RFC
> - Link Fingerprints for "?". It skipped my mind earlier, but you pointed
> out "other URI schemes" (as in other than HTTP), so if I understand
> things correctly, there seems to be 3 potential places to request.
>
> 1) URI: All URIs can use Link Fingerprints
> 2) HTTP: Only HTTP-based requests use Link Fingerprints

I think that, as you suggest, to prevent unforeseen consequences, we
probably need to define it for only a limited set of schemes.
http:
https:
ftp:
(I believe FTP currently doesn't use the fragment identifier, so we'd
need to test and see if The Right Thing happened in common user agents.)

> 3) Content-type: Only certain file types know about Link Fingerprints

I don't think we need to mandate restrictions here. Imagine:

http://www.foo.com/legal-document.html#!md5!09F9...

served with:

Content-Type: text/html
Content-disposition: attachment; filename=legal-document.html

> "URI" would be too broad when only a couple specific instances of it is
> needed: HTTP, FTP, "?". The URI spec states that the semantics of a
> #fragment-id is based on the MIME type,

Really? Do you have a reference for that? That could be both good and bad...

> There's a Technical Report for Common User Agent Problems about how new
> fragments should be handled [1]. It references an internet draft that
> describes how #fragment-ids are handled [2].

Note: We would need to disregard part of that suggestion ("If URI2
already has a fragment identifier, then #frag must not be appended and
the new target is URI2") for security reasons.

The draft you reference never made it to RFC status, and is eight years old.

Gerv

Edward Lee

unread,

May 15, 2007, 11:55:11 AM5/15/07

to

Gervase Markham wrote:
>> The URI spec states that the semantics of a #fragment-id is based
>> on the MIME type,

Second paragraph of the URI general syntax RFC [1].

4.1. Fragment Identifier

When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.

fragment = *uric

The semantics of a fragment identifier is a property of the data
resulting from a retrieval action, regardless of the type of URI used
in the reference. Therefore, the format and interpretation of
fragment identifiers is dependent on the media type [RFC2046] of the
retrieval result. The character restrictions described in Section 2
for URI also apply to the fragment in a URI-reference. Individual
media types may define additional restrictions or structure within
the fragment for specifying different types of "partial views" that
can be identified within that media type.

A fragment identifier is only meaningful when a URI reference is
intended for retrieval and the result of that retrieval is a document
for which the identified fragment is consistently defined.

>> There's a Technical Report for Common User Agent Problems about how
>> new fragments should be handled [1]. It references an internet
>> draft that describes how #fragment-ids are handled [2].
>

> The draft you reference never made it to RFC status, and is eight
> years old.

The draft is eight years old, but it was referenced in the technical
report from 2003. Bert Bos, who wrote the draft, filed bug 9040 [2] a
day before the draft's date, and this bug has been fixed for a long time.

> Note: We would need to disregard part of that suggestion ("If URI2
> already has a fragment identifier, then #frag must not be appended
> and the new target is URI2") for security reasons.

Right, I made a post in m.d.t.network because at minimum, that network
code needs to be changed (nsHttpChannel.cpp [3]).

Ed

[1] http://www.ietf.org/rfc/rfc2396.txt
[RFC2046] http://www.ietf.org/rfc/rfc2046.txt
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=9040
[3]
http://mxr.mozilla.org/mozilla/source/netwerk/protocol/http/src/nsHttpChannel.cpp#2253

Edward Lee

unread,

May 16, 2007, 4:05:18 PM5/16/07

to

Edward Lee wrote:
> Gervase Markham wrote:
>>> The URI spec states that the semantics of a #fragment-id is based
>>> on the MIME type,

I just noticed there's a newer RFC for URI Generic Syntax [1] that
obsoletes RFC2396 which I've been referencing, but the issues of
#fragment-ids are the same. But they do describe the #fragment-id in
more detail.

3.5. Fragment

The semantics of a fragment identifier are defined by the set of
representations that might result from a retrieval action on the
primary resource. The fragment's format and resolution is therefore
dependent on the media type [RFC2046] of a potentially retrieved
representation, even though such a retrieval is only performed if the
URI is dereferenced. If no such representation exists, then the
semantics of the fragment are considered unknown and are effectively
unconstrained. Fragment identifier semantics are independent of the
URI scheme and thus cannot be redefined by scheme specifications.

The interesting thing here is the last sentence saying that the URI
scheme, like HTTP, cannot dictate what the #fragment-id does.

I ran a little test as suggested by dveditz, and Firefox /does/ jump to
the anchor when loading a html page from a ftp resource with a
#fragment-id. This is expected because the content is html which has a
usage of #fragment-id. file:// has it working just like ftp:// and http://.

So the question is: Will we have to define Link Fingerprints for every
MIME type? Or we'll need to request a change to let schemes define
meaning to the #fragment-id? (This also impacts the RFC for the adding a
generic syntax for #[hash:md5]1234CAFE or #[find:text]search.)

(Probably more along the lines of "all mime types should recognize the
Link Fingerprint #fragment-id.")

Ed

[1] http://www.ietf.org/rfc/rfc3986.txt (Section 3.5)

Gervase Markham

unread,

May 21, 2007, 6:34:02 AM5/21/07

to

Edward Lee wrote:
> So the question is: Will we have to define Link Fingerprints for every
> MIME type?

If we did that, whose toes would we be stepping on? Everyone's or
no-one's? If the URI scheme cannot dictate what #fragment-id does, who can?

> (Probably more along the lines of "all mime types should recognize the
> Link Fingerprint #fragment-id.")

Either that, or have an explicit list:

application/octet-stream
application/zip
application/gzip
application/x-xpi
...

Hmm. It would be a rather long list. :-(

Perhaps we could define it for application/*? But that's not very clean
either.

Gerv

Edward Lee

unread,

May 21, 2007, 6:49:39 PM5/21/07

to

Edward Lee wrote:
> "sub"-#fragment-ids separated within a whole #fragment-id

There's an internet draft "URI Fragment Identifiers for the text/plain
Media Type" that uses #fragment-ids for text/plain for multiple fragment
identifier methods (char, line, match, length, md5) that can be combined
with semicolons.

Ed

[1] http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-06.txt

Edward Lee

unread,

May 21, 2007, 8:04:44 PM5/21/07

to

Gervase Markham wrote:
> Edward Lee wrote:
>> So the question is: Will we have to define Link Fingerprints for every
>> MIME type?
> If we did that, whose toes would we be stepping on? Everyone's or
> no-one's?

Seems more like everyone's.

Relating to adding new syntax to the #fragment-id (and not just Link
Fingerprints), adding special meaning to characters would require them
to be properly escaped. E.g., using "!" for #!md5!1234 might require
text/plain's regular expression matching [1] (working draft) to escape
the character: #match=bang! -> #match=bang\!. The draft already
explicitly states the semicolon (used as a delimiter) needs to be
backslash-escaped in the regular expression.

The above issue isn't too bad if the extended #fragment-id was part of
the URI: "(except the characters which are required by the URI syntax to
be escaped)," but we can only specify #!md5! per MIME-type.

A separate issue of toe-stomping is unnecessarily redefining actions. If
there are existing uses of #fragment-ids for a MIME-type, defining a
global cross-MIME-type definition would duplicate the effort. The same
internet draft [1] has a "#md5=1234abcd" which could overlap with
#!md5!1234abcd. But here the text/plain could just be redefined to not
expect a md5=<hash>.

(I'm still not sure which types already define what the fragment is used
for and which have proposals. So far it's been text/html and text/plain.)

Ed

[1] http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-06.txt

Gervase Markham

unread,

May 22, 2007, 6:43:37 AM5/22/07

to

Edward Lee wrote:
> Relating to adding new syntax to the #fragment-id (and not just Link
> Fingerprints), adding special meaning to characters would require them
> to be properly escaped.

I don't understand. ! was chosen in large part because URL syntax does
not require it to be escaped.

> E.g., using "!" for #!md5!1234 might require
> text/plain's regular expression matching [1] (working draft)

Ah, well spotted. We should get in touch with that guy and coordinate.
He seems to be doing a lot of the same stuff as us.

> A separate issue of toe-stomping is unnecessarily redefining actions. If
> there are existing uses of #fragment-ids for a MIME-type, defining a
> global cross-MIME-type definition would duplicate the effort.

Only if the two definitions served the same purpose. After all, we are
not "duplicating the effort" of linking to IDs within HTML documents
(the existing use of fragment-ids for HTML).

Gerv

Edward Lee

unread,

May 22, 2007, 12:32:10 PM5/22/07

to

Gervase Markham wrote:
> I don't understand. ! was chosen in large part because URL syntax does
> not require it to be escaped.

Right. "!" doesn't need to be escaped because it doesn't have special
meaning in the #fragment-id as defined by URI (general syntax) or HTTP
(specific syntax). Link Fingerprints is a "user" of the #fragment-id, so
we're "peers" with other things that want to use the #fragment-id.
There's potential for conflict because there's a limited number of
desirable characters (other specs might want to use "!" for similar
reason why you chose it in the first place).

A specific instance of this issue is part of the reason why Link
Fingerprints start with "!"; there's no existing usage of "!" for HTML
"id"s, so it doesn't conflict for text/html.

If the text/plain working draft happened to use "!" instead of ";" to
delimit multiple pieces, how would the browser decide if the "!"s are
text/plain's delimiters or surrounding a hash type? (And similarly, how
to decide when to force the #fragment-id to be passed on to the target
URI on a redirect.)

Ed

Edward Lee

unread,

May 22, 2007, 12:51:19 PM5/22/07

to

Edward Lee wrote:
> HTTP spec will need to get touched because Link Fragments must be passed
> along through HTTP redirects.

If Link Fingerprints is defined per MIME-type, how can we force the Link
Fingerprint #fragment-id to overwrite the new URI's #fragment-id on a
redirect if we don't get a file type until we actually get to the final
destination.

The default response from Apache on a HTTP/1.1 302 Found redirect
includes "Content-Type: text/html," and it wouldn't know what the actual
file type of target is. But the type in the response can be anything, so
we shouldn't be deciding to use Link Fingerprints by looking at the type
of the intermediate.

<?php
header("Location: bad.exe");
header("Content-type: application/x-fake-link-fingerprint");
?>

HTTP/1.1 302
Date: Tue, 22 May 2007 16:49:58 GMT
Server: Apache/1.3.37 (Unix) mod_throttle/3.1.2 DAV/1.0.3
mod_fastcgi/2.4.2 mod_gzip/1.3.26.1a PHP/4.4.4 mod_ssl/2.8.22 OpenSSL/0.9.7e
X-Powered-By: PHP/5.2.1
Location: bad.exe
Content-Type: application/x-fake-link-fingerprint

Ed

Gervase Markham

unread,

May 23, 2007, 4:45:11 AM5/23/07

to

Edward Lee wrote:
> If the text/plain working draft happened to use "!" instead of ";" to
> delimit multiple pieces, how would the browser decide if the "!"s are
> text/plain's delimiters or surrounding a hash type? (And similarly, how
> to decide when to force the #fragment-id to be passed on to the target
> URI on a redirect.)

The obvious solution to this is that we need to work with this guy to
make our syntax compatible. That's why it's published as a draft :-)

Gerv

Gervase Markham

unread,

May 23, 2007, 4:48:38 AM5/23/07

to

Edward Lee wrote:
> If Link Fingerprints is defined per MIME-type, how can we force the Link
> Fingerprint #fragment-id to overwrite the new URI's #fragment-id on a
> redirect if we don't get a file type until we actually get to the final
> destination.

Good question.

In fact, this means that we have to make Link Fingerprints valid for all
MIME types. Here's why.

Say I put up a file http://www.foo.com/important.zip , and send out a
link to people:
http://www.foo.com/important.zip#!md5!09F9...

If someone hacks my webserver, they could cause that URL to return a 302
redirect to their trojaned file, with Content-Type: something/random. If
we only preserved link fingerprints for MIME types we knew, then the
fingerprint would fall off, and the user would get the evil file without
a warning.

Gerv

Edward Lee

unread,

May 28, 2007, 6:37:34 PM5/28/07

to

With Link Fingerprints using the #fragment-id, other programs will want
to add special meaning to it as well. This means there'll be more and
more conflicts if only one id is allowed in the #fragment-id at a time.

Metalinks [1] are already supported by several download managers
(GetRight, FlashGot, DownThemAll, etc.) and used by various sites
(openSUSE, OpenOffice, Ubuntu, etc.) to allow download managers to
automatically choose mirrors, check file integrity, and speed downloads.

If a content provider wants to ensure a download with Link Fingerprints
as well as providing a Metalink alternative for download managers that
support it, that currently cannot be done.

From the Metalink 3 spec [2]...

1.7 Backward compatibility

If clients support it, Metalink 3.0 is backward compatible with regular
hyperlinks. This is done by adding
#!metalink3!http://www.example.com/file.ext.metalink onto the end of a
URL like so:

http://www.example.com/file.ext#!metalink3!http://www.example.com/file.ext.metalink

Clients that do not recognize Metalink 3.0 will drop what is after the
first #. This backward compatibility is inspired by Gervase Markham’s
Link Fingerprints.

Ed

[1] http://metalinker.org/
[2] http://metalinker.org/Metalink_3.0_Spec.txt

Gervase Markham

unread,

May 29, 2007, 4:40:46 AM5/29/07

to

Edward Lee wrote:
> If a content provider wants to ensure a download with Link Fingerprints
> as well as providing a Metalink alternative for download managers that
> support it, that currently cannot be done.

Despite the fact that the syntax is inspired by Link Fingerprints, I'm
not a fan of metalink. I strongly believe it should have been done as a
microformat, allowing backwards compatibility with old web browsers. And
what's that bogus artefact "3" doing on there anyway?

I also think it's a solution looking for a problem.

Gerv

Mike Shaver

unread,

May 29, 2007, 7:46:33 AM5/29/07

to Gervase Markham, dev-apps...@lists.mozilla.org

On 5/29/07, Gervase Markham <ge...@mozilla.org> wrote:
> Edward Lee wrote:
> > If a content provider wants to ensure a download with Link Fingerprints
> > as well as providing a Metalink alternative for download managers that
> > support it, that currently cannot be done.
>
> Despite the fact that the syntax is inspired by Link Fingerprints, I'm
> not a fan of metalink. I strongly believe it should have been done as a
> microformat, allowing backwards compatibility with old web browsers.

I disagree. Doing it as a microformat would have required that the
link be wrapped in HTML, and that means it's harder to manage with
download managers, send in email, paste in IRC, use on a command line,
or otherwise work with. Especially for large files, where metalink is
most useful, web browsers are often not involved at all. It's a
descriptor of how to fetch and verify the resource, which seems like
an entirely appropriate thing to build into a URL.

And it is backwards compatible with old web browsers to the same
extent that link fingerprints are, AFAICT.

Why not do link fingerprints as a microformat?

> And
> what's that bogus artefact "3" doing on there anyway?

I think that's what the kids are calling "versioning" or
"future-proofing" these days.

> I also think it's a solution looking for a problem.

I think it's found one -- providing mirror information and hash values
lets download clients select better (for the user and the set of
providers) download behaviour, detect and deal with corruption, and
generally reason about their fetching task in a superior way.
Effective mirror selection and verification are not trivial tasks, as
anyone who has tried to download a Linux distribution from a foreign
site can likely attest.

Mike

Gervase Markham

unread,

May 29, 2007, 8:06:19 AM5/29/07

to

Mike Shaver wrote:
> I disagree. Doing it as a microformat would have required that the
> link be wrapped in HTML, and that means it's harder to manage with
> download managers, send in email, paste in IRC, use on a command line,
> or otherwise work with.

Maybe I didn't quite explain (or maybe I did, and you still think it
sucks) :-) What I envisaged was an HTML page having a "Download" link,
which would be to a file called foobar.metalink, served as text/html.
Supporting UAs would grab the file, note that it conformed to the
microformat, do the selection and start downloading. Non-supporting UAs
would see something like:

Please select a mirror and download mechanism:

* America, East Coast (HTTP)
* America, West Coast (HTTP)
* Bittorrent
...

which would be the HTML rendering of the microformat. This way, people
offering downloads (assuming they want to give everyone the choice)
don't need to do both a list for most people, and a Metalink for those
who have supporting clients. The version with the metalink URL embedded
in the base URL means that everyone who doesn't have metalink-capable
clients uses the same URL (i.e. the base one, before the #), negating
much of the point of having mirrors in the first place.

> And it is backwards compatible with old web browsers to the same
> extent that link fingerprints are, AFAICT.

Yes.

> Why not do link fingerprints as a microformat?

Because I think email (e.g. email notifications of security updates) is
a very important use case. In the link fingerprints case, it's important
for security that the fingerprint come via a different channel to the
data, so it has to be in the email. In the metalink case, the email can
contain a link to the metalink file directly, or to an HTML download
page which links to it, or whatever.

>> And
>> what's that bogus artefact "3" doing on there anyway?
>
> I think that's what the kids are calling "versioning" or
> "future-proofing" these days.

My gripe (and, admittedly, it's a small one) is that there was never a
metalink1 or 2 in this URL-based form. The 3 is an artefact of their
internal naming scheme.

> I think it's found one -- providing mirror information and hash values
> lets download clients select better (for the user and the set of
> providers) download behaviour, detect and deal with corruption, and
> generally reason about their fetching task in a superior way.
> Effective mirror selection and verification are not trivial tasks, as
> anyone who has tried to download a Linux distribution from a foreign
> site can likely attest.

But accomplishing this does not, to my mind, require a new XML format
which is not understood by older clients. The URL # thing is a hack
around the fact that they picked such a format.

If metalink takes off, a large proportion of download links on the web
will be of the form "The resource is here, no actually forget that, it's
here". Which seems wrong.

Gerv

Mike Shaver

unread,

May 29, 2007, 8:55:17 AM5/29/07

to Gervase Markham, dev-apps...@lists.mozilla.org

On 5/29/07, Gervase Markham <ge...@mozilla.org> wrote:

> Maybe I didn't quite explain (or maybe I did, and you still think it
> sucks) :-) What I envisaged was an HTML page having a "Download" link,
> which would be to a file called foobar.metalink, served as text/html.
> Supporting UAs would grab the file, note that it conformed to the
> microformat, do the selection and start downloading. Non-supporting UAs
> would see something like:

That requires that the download client be able to understand HTML,
which is a pretty tall order.

> The version with the metalink URL embedded
> in the base URL means that everyone who doesn't have metalink-capable
> clients uses the same URL (i.e. the base one, before the #), negating
> much of the point of having mirrors in the first place.

It doesn't take away to option of doing

> > And it is backwards compatible with old web browsers to the same
> > extent that link fingerprints are, AFAICT.
>
> Yes.
>
> > Why not do link fingerprints as a microformat?
>
> Because I think email (e.g. email notifications of security updates) is
> a very important use case. In the link fingerprints case, it's important
> for security that the fingerprint come via a different channel to the
> data, so it has to be in the email.

So why not link to an HTML download page that has the hash info on it?
Or send the hash in another format (but not a "new XML format"
unknown to existing clients, natch!) as an attachment? Or send HTML
email with the additional markup?

(I think that email is the least-easily securable part of the update
chain there, given the relative availability of HTTPS vs S/MIME.)

I think that being able to work with mirrored, hash-checked,
multi-protocol download specifiers backwards-compatibly via simple
URLs is a pretty decent use case, and it's pretty hard to make a
microformat work everywhere a URL does. (Especially given the
necessary prevalence of HTML restrictions on many sites, where a URL
might be easily provided and mirroring and promote-to-BT might be
especially useful.)

> My gripe (and, admittedly, it's a small one) is that there was never a
> metalink1 or 2 in this URL-based form. The 3 is an artefact of their
> internal naming scheme.

Yeah, and where's IPv5, anyway?!? :)

> > I think it's found one -- providing mirror information and hash values
> > lets download clients select better (for the user and the set of
> > providers) download behaviour, detect and deal with corruption, and
> > generally reason about their fetching task in a superior way.
> > Effective mirror selection and verification are not trivial tasks, as
> > anyone who has tried to download a Linux distribution from a foreign
> > site can likely attest.
>
> But accomplishing this does not, to my mind, require a new XML format
> which is not understood by older clients.

There's no existing format for specifying those things, so I don't
know how you could convey those details without creating a new one.
And I don't know what the XML format choice has to do with the
fragment-ID use, TBH -- earlier you seemed to be arguing that they
should link directly to such a file, no?

> The URL # thing is a hack
> around the fact that they picked such a format.

No, it's a way to associate additional, optional data with the target
of the link in a backward-compatible form that can be used wherever a
URL is permitted. The link is still a valid place to fetch the file.

> If metalink takes off, a large proportion of download links on the web
> will be of the form "The resource is here, no actually forget that, it's
> here". Which seems wrong.

Many download links on the web are redirects, which means exactly
that, so that the server can do geo-ip checking or additional load
balancing.

Mike

Gervase Markham

unread,

May 30, 2007, 5:47:41 AM5/30/07

to

Mike Shaver wrote:
> That requires that the download client be able to understand HTML,
> which is a pretty tall order.

Well, it depends how strict you make the microformat. I wasn't
anticipating that it would be embedded in an arbitrarily complex HTML
page. Top-of-the-head straw man:

...
<li><a href="bittorrent://foo.bar.com/DF4623">Bittorrent</a>
<li><a ref="gb" href="http://www.bar.com/wibble">HTTP (UK)</a>
...

where an automatic client would use a regexp to extract the URLs, from
which it can find the available protocols, and use the ref attributes to
find locations and pick the closest one.

This would be a bit like the things Hixie did to restrict the complexity
of HTML for Pingback, so Pingback clients didn't need full HTML parsers.

>> The version with the metalink URL embedded
>> in the base URL means that everyone who doesn't have metalink-capable
>> clients uses the same URL (i.e. the base one, before the #), negating
>> much of the point of having mirrors in the first place.
>
> It doesn't take away to option of doing

Huh? :-)

>> Because I think email (e.g. email notifications of security updates) is
>> a very important use case. In the link fingerprints case, it's important
>> for security that the fingerprint come via a different channel to the
>> data, so it has to be in the email.
>
> So why not link to an HTML download page that has the hash info on it?

Because if the HTML download page is on the same server as the download,
then both could be hacked at the same time. And if they aren't, then you
need to have two independent servers, which takes you from "100% of
people can use this" to a much smaller number.

> (I think that email is the least-easily securable part of the update
> chain there, given the relative availability of HTTPS vs S/MIME.)

I think email has some great security properties. For example, I'm about
to send an email to my friend George telling him where to get a security
update. Can you intercept it and change it so he downloads the wrong thing?

>> But accomplishing this does not, to my mind, require a new XML format
>> which is not understood by older clients.
>
> There's no existing format for specifying those things, so I don't
> know how you could convey those details without creating a new one.

I merely meant that you could create a backwardly-compatible one - see
above.

> And I don't know what the XML format choice has to do with the
> fragment-ID use, TBH -- earlier you seemed to be arguing that they
> should link directly to such a file, no?

Because their XML format is not understood by e.g. Firefox 1.5, they
have to use the fragment ID trick to make things work in a
backwardly-compatible way. If they'd used a simple HTML page
microformat, they wouldn't need to use the fragment ID stuff.

>> If metalink takes off, a large proportion of download links on the web
>> will be of the form "The resource is here, no actually forget that, it's
>> here". Which seems wrong.
>
> Many download links on the web are redirects, which means exactly
> that, so that the server can do geo-ip checking or additional load
> balancing.

True.

So why do we need metalink at all? The server has the resources to do
geo-ip, load balancing, and so on. All it doesn't know is a list of
supported download protocols. Perhaps that's the right fix - a
Supported-Protocols: HTTP header for requests. Then the server has all
the information it needs to make the best determination about what
mirror to use.

Gerv

Robert Sayre

unread,

Jun 3, 2007, 6:01:29 PM6/3/07

to Mike Shaver, Gervase Markham, dev-apps...@lists.mozilla.org

Mike Shaver wrote:
>
> Why not do link fingerprints as a microformat?
>

Doing that would also avoid unilaterally extending RFC3986. A good thing.

- Rob

Gervase Markham

unread,

Jun 4, 2007, 5:44:37 AM6/4/07

to

Robert Sayre wrote:
> Doing that would also avoid unilaterally extending RFC3986. A good thing.

I think (as I've noted in my reply to Mike) that it would also damage
various use cases. He's made the argument that metalinks should work
wherever a link works; I think that argument applies even more strongly
to Link Fingerprints.

Gerv

Robert Sayre

unread,

Jun 4, 2007, 12:38:03 PM6/4/07

to Gervase Markham

Agree. I can see some benefits. How would we feel if some other browser
vendor extended the syntax for URIs?

Wouldn't it be polite to ask <http://lists.w3.org/Archives/Public/uri/>
for review?

- Rob

Gervase Markham

unread,

Jun 5, 2007, 5:34:03 AM6/5/07

to

Robert Sayre wrote:
> Agree. I can see some benefits. How would we feel if some other browser
> vendor extended the syntax for URIs?

It depends. I want to draft an RFC first and get comments on it, and
have it published. I agree we shouldn't do this unilaterally.

> Wouldn't it be polite to ask <http://lists.w3.org/Archives/Public/uri/>
> for review?

Quite possibly. I thought URI schemes were generally defined by RFCs,
and the IETF, though?

Gerv

Mike Shaver

unread,

Jun 5, 2007, 8:40:13 AM6/5/07

to Gervase Markham, dev-apps...@lists.mozilla.org

On 6/5/07, Gervase Markham <ge...@mozilla.org> wrote:

> Robert Sayre wrote:
> > Wouldn't it be polite to ask <http://lists.w3.org/Archives/Public/uri/>
> > for review?
>
> Quite possibly. I thought URI schemes were generally defined by RFCs,
> and the IETF, though?

I don't think anyone is suggesting that you not consult with other
stakeholders -- are you saying that you don't think it's appropriate
to ask the W3 list for their thoughts?

Mike

Robert Sayre

unread,

Jun 5, 2007, 1:58:24 PM6/5/07

to Mike Shaver, Gervase Markham, dev-apps...@lists.mozilla.org

Mike Shaver wrote:
> On 6/5/07, Gervase Markham <ge...@mozilla.org> wrote:
>> Robert Sayre wrote:
>> > Wouldn't it be polite to ask <http://lists.w3.org/Archives/Public/uri/>
>> > for review?
>>
>> Quite possibly. I thought URI schemes were generally defined by RFCs,
>> and the IETF, though?

Yep, but the IETF isn't too fussy about mailing list locations. That
list is where the IETF discussions occur.

- Rob

Robert Sayre

unread,

Jun 5, 2007, 2:01:36 PM6/5/07

to Gervase Markham

Gervase Markham wrote:
> Robert Sayre wrote:
>> Agree. I can see some benefits. How would we feel if some other
>> browser vendor extended the syntax for URIs?
>
> It depends. I want to draft an RFC first and get comments on it, and
> have it published. I agree we shouldn't do this unilaterally.
>

The terminology is confusing. A work-in-progress document that you would
like comments on is called an Internet-Draft. If you would like help
with producing one of these, let me know. I've written one or two.

The finished product is called an RFC, and this what you have /after/
you've considered all the comments. :)

- Rob

Gervase Markham

unread,

Jun 6, 2007, 5:39:44 AM6/6/07

to

Not at all. I'm sure it's appropriate.

Gerv

Gervase Markham

unread,

Jun 6, 2007, 5:40:19 AM6/6/07

to

Robert Sayre wrote:
> The terminology is confusing. A work-in-progress document that you would
> like comments on is called an Internet-Draft. If you would like help
> with producing one of these, let me know. I've written one or two.

Yes, please. Ed Lee (currently working at the MoCo offices, next to Dan
Veditz) is the man.

Gerv