Transparent Metalinks?

21 views
Skip to first unread message

Anthony Bryan

unread,
Mar 27, 2008, 4:00:34 PM3/27/08
to Metalink Discussion
in talking to some people the other day, some are concerned about the
extra choices that may be present in downloads.

[normal download], [torrent], [metalink], [next download format]

I'd like to think that this list could be just 2 options, normal
download & metalink, since hopefully metalink will be able to describe
the next download format as well.

but for some people, even the 2 options are too much.

is there a way we can make using metalinks easier for people already
using metalink clients?

at first, we had a way similar to Link Fingerprints -
URL#!metalink3!URLtoMetalink - which was a bit of a hack but worked!
apparently against spec, but it was a nice way to piggyback a metalink
URL onto a regular URL. most clients dropped the extra info starting
at #, while metalink clients could just get the metalink.

another (unimplemented) idea was a microformat, where Operator or
FlashGot passed on the metalink URL to clients that could use it,
while other clients got the URL to the regular file.

I think the semantic web people had a similar problem, where a URL
could serve up HTML for browsers, or RDF for RDF-aware clients.

maybe something like if the client has the metalink MIME type in the
Accept header & they request a file that has a metalink describing it,
send the metalink first but then (if the same server serving the
metalink also has the regular file) if the metalink client requests
the file again it will get the regular file (& not go into some loop).

is that possible or is there an elegant solution?

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
)) Easier, More Reliable, Self Healing Downloads

Neil M.

unread,
Mar 28, 2008, 2:52:20 AM3/28/08
to metalink-...@googlegroups.com
>
> maybe something like if the client has the metalink MIME type in the
> Accept header & they request a file that has a metalink describing it,
> send the metalink first but then (if the same server serving the
> metalink also has the regular file) if the metalink client requests
> the file again it will get the regular file (& not go into some loop).
>
> is that possible or is there an elegant solution?
>

How about just have the download link server look at the user agent and
then redirect the download link accordingly?

if user_agent == known_metalink_client:
redirect: "myfile.metalink"
else:
redirect: "myfile.exe"

Neil

Neil M.

unread,
Mar 28, 2008, 3:14:18 AM3/28/08
to metalink-...@googlegroups.com

Errrr, nevermind, I now see the error of my ways. That only works if
you manually plug the URL into the metalink client, it won't
automatically launch the client when clicking on the link.

I think to do what you are talking about you would need to check the
accept value when you present the download page to the user:

if accept_header == "metalink":
print "myfile.metalink"
else:
print "myfile.exe"

Of course this requires a browser modification or a plugin to do right.
Once you have a plugin, you can do some javascript checking to see if
the plugin or browser version is correct and change the URL link too.

Neil

Sebastien WILLEMIJNS

unread,
Mar 28, 2008, 4:56:08 AM3/28/08
to metalink-...@googlegroups.com

On Fri, 28 Mar 2008 00:14:18 -0700, "Neil M." <nabb...@gmail.com> said:

> if accept_header == "metalink":
> print "myfile.metalink"
> else:
> print "myfile.exe"

i agree ;)

Anthony Bryan

unread,
Mar 28, 2008, 5:28:39 PM3/28/08
to metalink-...@googlegroups.com

let's look at this from a native browser modification or plugin angle
(I'm guessing FlashGot could alter the Accept header if a metalink
compatible download manager is installed).

won't this loop, once the client requests myfile again to start the
actual download after it's requested it first and gotten the metalink?
assuming the metalink is stored on the same server as the file to be
downloaded...

SuperStar Stunna

unread,
Mar 28, 2008, 7:18:09 PM3/28/08
to Metalink Discussion

I mentioned this last year when I was talking about a URI scheme for
Metalinks.
I don't think it could be any easier for the end user than simply
clicking a Metalink URL.

it's near the top of this thread:

Mirror URIs & MetaMirror Servers
http://groups.google.com/group/metalink-discussion/browse_thread/thread/d49035242333c0f/

Sebastien WILLEMIJNS

unread,
Mar 29, 2008, 3:24:23 AM3/29/08
to metalink-...@googlegroups.com

On Fri, 28 Mar 2008 16:18:09 -0700 (PDT), "SuperStar Stunna"
<stu...@gmail.com> said:
>
>
> I mentioned this last year when I was talking about a URI scheme for
> Metalinks.
> I don't think it could be any easier for the end user than simply
> clicking a Metalink URL.
>
> it's near the top of this thread:
>
> Mirror URIs & MetaMirror Servers
> http://groups.google.com/group/metalink-discussion/browse_thread/thread/d49035242333c0f/

it is true than X months ago, they has a thread with a small nicolas's
test on his own server which permit to
purpose an XML page which include on his own the metalink on a classical
WEB page...

SuperStar Stunna

unread,
Mar 29, 2008, 4:24:48 PM3/29/08
to Metalink Discussion

does this page still exist?
can you post a URL or demo page?


On Mar 29, 2:24 am, "Sebastien WILLEMIJNS" <sebast...@willemijns.com>
wrote:
> On Fri, 28 Mar 2008 16:18:09 -0700 (PDT), "SuperStar Stunna"
> <stu...@gmail.com> said:
>
>
>
> > I mentioned this last year when I was talking about a URI scheme for
> > Metalinks.
> > I don't think it could be any easier for the end user than simply
> > clicking a Metalink URL.
>
> > it's near the top of this thread:
>
> > Mirror URIs & MetaMirror Servers
> >http://groups.google.com/group/metalink-discussion/browse_thread/thre...

Sebastien WILLEMIJNS

unread,
Mar 29, 2008, 6:09:54 PM3/29/08
to metalink-...@googlegroups.com

On Sat, 29 Mar 2008 13:24:48 -0700 (PDT), "SuperStar Stunna"
<stu...@gmail.com> said:
>
>
> does this page still exist?
> can you post a URL or demo page?
>

http://groups.google.fr/group/metalink-discussion/browse_thread/thread/0c02036a9db64119/c2547ba200fe7f6a#c2547ba200fe7f6a

Neil M.

unread,
Mar 29, 2008, 9:01:17 PM3/29/08
to metalink-...@googlegroups.com

> won't this loop, once the client requests myfile again to start the
> actual download after it's requested it first and gotten the metalink?
> assuming the metalink is stored on the same server as the file to be
> downloaded...
>

Nope. There is never a request for "myfile" in this case. The two
situations are like this:

1. Metalink

- Browser requests download.html, server generates page with the
myfile.metalink download link since special accept header is present
- Browser requests myfile.metalink, pass file to metalink client
- Metalink client does normal download of myfile.exe

2. No Metalink client
- Browser requests download.html, server generates page with the normal
download link to myfile.exe.
- Browser requests myfile.exe as usual.

This isn't a complete solution as it requires some type of server side
scripting. A javascript component would probably cover any websites
that don't have that option.

Neil

Dr. Peter Poeml

unread,
Apr 7, 2008, 7:48:28 AM4/7/08
to metalink-...@googlegroups.com

This sounds good because it will work. However, the requirement for the
server-side intelligence it something that I would tend to avoid.

I'd rather suggest the following:

- the client sends an Accept header indicating its ability, but not
application/metalink+xml, because nearly every client sends "Accept:
application/*" which would mean the same. Thus, some different header
is required. Either "Accept: x-application/metalink" (maybe), or a new
header, like "X-Metalink: yes"

- the server may notice and understand the header, and return a metalink
file (either by generating it, or by rewriting the request to
${url}.metalink. The rewrite could be an internal redirect or an
external redirect which sends the client to the new location (the one
with ".metalink" appended).

- _if_ the client is about to follow a link which it acquired from with
in a metalink <url> element, in order to get the requested file from
the same server, it must not add the header which indicates its
ability, to avoid a loop.

I think that would work best.

In my opinion, the server logic needed here, in order to handle
and distinguish metalink/non-metalink requests is minimal, and can be
implemented by pure mod_rewrite magic, and doesn't require a script, or
a content generator which creates "special" content. The metalink files
that it redirects to can simply be on-disk. (Or of course they can still
be created on the fly.)

Does that make sense?

I believe that such support to transparent negotiate metalink handling
will be a big leap forward for metalinks!

Thanks,
Peter
--
"WARNING: This bug is visible to non-employees. Please be respectful!"

SUSE LINUX Products GmbH
Research & Development

Dr. Peter Poeml

unread,
Apr 7, 2008, 6:20:37 PM4/7/08
to metalink-...@googlegroups.com
On Mon, Apr 07, 2008 at 01:48:28PM +0200, Dr. Peter Poeml wrote:
> This sounds good because it will work. However, the requirement for the
> server-side intelligence it something that I would tend to avoid.
>
> I'd rather suggest the following:
>
> - the client sends an Accept header indicating its ability, but not
> application/metalink+xml, because nearly every client sends "Accept:
> application/*" which would mean the same. Thus, some different header
> is required. Either "Accept: x-application/metalink" (maybe), or a new
> header, like "X-Metalink: yes"
>
> - the server may notice and understand the header, and return a metalink
> file (either by generating it, or by rewriting the request to
> ${url}.metalink. The rewrite could be an internal redirect or an
> external redirect which sends the client to the new location (the one
> with ".metalink" appended).
>
> - _if_ the client is about to follow a link which it acquired from with
> in a metalink <url> element, in order to get the requested file from
> the same server, it must not add the header which indicates its
> ability, to avoid a loop.

There is one bit, that occured to me later, that I didn't mention
explicitely here -- the case where the metalink-enabled client can't
know in advance whether the server is going to be metalink-capable, or
if it a plain HTTP server which will return the requested object, and
not a metalink.

The client needs to be able to handle that "normal" (non-metalink)
reply. I don't know if that is practical, I hope so though -- I somehow
assume that a metalink client would, naturally, be able to act as normal
HTTP client.

Thus, the client could speak to non-metalink HTTP servers, as well as to
metalink-enabled HTTP servers which return metalinks only for some
files.

(Think of
- files that are not supposed to be redirected for various reasons
- mini files which can more efiiciently be return as is, instead of
sending a mirror list instead
- no mirror available for a certain files
)

> I think that would work best.
>
> In my opinion, the server logic needed here, in order to handle
> and distinguish metalink/non-metalink requests is minimal, and can be
> implemented by pure mod_rewrite magic, and doesn't require a script, or
> a content generator which creates "special" content. The metalink files
> that it redirects to can simply be on-disk. (Or of course they can still
> be created on the fly.)
>
> Does that make sense?
>
> I believe that such support to transparent negotiate metalink handling
> will be a big leap forward for metalinks!

Another consideration is to keep intermediate caches in mind. If the
response varies with regard what the client sent, it probably needs a
Vary: header to indicate which part of the request is causing the
variation (metalink or not, in our case), or if nothing else works, make
the metalink response uncachable.

Nils

unread,
Apr 24, 2008, 5:48:58 PM4/24/08
to Metalink Discussion
First of all, I don't share the concerns about application/*.
You simply would q= application/metalink+xml which the server might
then interpret (Apache Multiviews/TCN is capable of this for
instance[1]).

Then the statement that most clients send application/* isn't exactly
true[2]...
And besides, this doesn't even actually matter.
Because you can tell the server to serve application/octet-stream
instead of application/metalink+xml if the client doesn't explicitly
ask for application/metalink+xml.

The client can tell from the response if it has been served a metalink
or not (Content-Type).

The "avoid loop" is indeed an important remark. ;)

All in all the tools for "tranparent" metalinks via TCN exist for
quiet some time.
There is no real need to "reinvent" the wheel for metalinks. ;)

I'll do some experiments myself, I guess.



[1]
http://httpd.apache.org/docs/2.2/mod/mod_negotiation.html

[2]
* Firefox:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8

* Internet Explorer 7:
Accept: */*

* Opera:
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/
png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

Peter Poeml

unread,
Apr 27, 2008, 6:45:23 AM4/27/08
to metalink-...@googlegroups.com
Hi Nils,

thanks for picking this up!

On Thu, Apr 24, 2008 at 02:48:58PM -0700, Nils wrote:
>
> First of all, I don't share the concerns about application/*.
> You simply would q= application/metalink+xml which the server might
> then interpret (Apache Multiviews/TCN is capable of this for
> instance[1]).
>
> Then the statement that most clients send application/* isn't exactly
> true[2]...

Indeed, you are right. I now sampled some minutes of logs on a busy web
server, and I see that application/* turns up exactly once (msnbot/1.1),
while there are gazillions of clients sending */* (and lots that send no
accept header at all), but no other client which sends application/*.


> And besides, this doesn't even actually matter.
> Because you can tell the server to serve application/octet-stream
> instead of application/metalink+xml if the client doesn't explicitly
> ask for application/metalink+xml.

You are right. What I initially had in mind, is that some servers might
have a particular server-side "variant selection algorithm" in place,
which might lead to unwanted results, if the client sends either
application/* or */*. I had these concerns because I don't know very
well which existing mechanism might be out there. I now realize that
such mechanisms
- are maybe not in wide use, and Apache's mod_negotiation is probably
used primarily for selecting language variants of static pages.
- and that there isn't ant existing server-side implementation which
negotiates metalinks yet, and no other negotiation mechanism would
return a metalink anyway (because it doesn't know about them.)

> The client can tell from the response if it has been served a metalink
> or not (Content-Type).
>
> The "avoid loop" is indeed an important remark. ;)

Exactly.

> All in all the tools for "tranparent" metalinks via TCN exist for
> quiet some time.
> There is no real need to "reinvent" the wheel for metalinks. ;)

Seems you are right :-)

> I'll do some experiments myself, I guess.

I have changed the download.opensuse.org server to negotiate upon the
Accept header now. I kept the Accept-Features negotiation in for now,
but it is scheduled to be removed later. (At the moment there is still
some documentation referring to it.)

Thus, metalinks can now be negotiated like this:

% curl -s -H "Accept: foobar,application/metalink+xml,*/*" 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt' | head -1
<?xml version="1.0" encoding="UTF-8"?>


The reply comes with
Content-Disposition: attachment; filename="GPLv3.txt.metalink"
Content-Type: application/metalink+xml; charset=UTF-8


For clients not sending "application/metalink+xml" in the accept header,
the file itself returned.
% curl -s 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt' | head -1
GNU GENERAL PUBLIC LICENSE


Can you try if that works for you / makes sense for you?

Let me know what you think.

Thanks!

Anthony Bryan

unread,
Apr 29, 2008, 2:08:19 PM4/29/08
to metalink-...@googlegroups.com
On Sun, Apr 27, 2008 at 6:45 AM, Peter Poeml <po...@suse.de> wrote:
> % curl -s -H "Accept: foobar,application/metalink+xml,*/*" 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt' | head -1
> <?xml version="1.0" encoding="UTF-8"?>
>
>
> The reply comes with
> Content-Disposition: attachment; filename="GPLv3.txt.metalink"
> Content-Type: application/metalink+xml; charset=UTF-8
>
>
> For clients not sending "application/metalink+xml" in the accept header,
> the file itself returned.
> % curl -s 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt' | head -1
> GNU GENERAL PUBLIC LICENSE

this is so nice & useful!

it brings up some issues regarding MIME type.
"application/metalink+xml" was originally chosen because it follows
the convention of other XML formats, but this is not officially
registered. (KDE had an issue with this).

looking at http://www.iana.org/cgi-bin/mediatypes.pl ("Note:
Registrations in the standards tree must be approved by the IESG and
must correspond to a formal publication by a recognized standards
body." "Standards Tree - " (blank)) makes it sound like
"application/metalink+xml" would not be possible until metalink is
approved by a standards group. (which I'm not against).

so, we could continue using the unofficial unregistered MIME type
(strangely, formats like RSS and bittorrent are not registered, at
least according to
http://www.iana.org/assignments/media-types/application/ ) or we could
register one.

I'd like to register and it looks like we'd be in the Vendor tree, at
least for now. I don't know we could "upgrade" back to
application/metalink+xml later if approved by a standards group.

I think application/vnd.metalinker.org+xml would be good.

anyone see possible problems with that?

Nils

unread,
Apr 30, 2008, 9:46:41 AM4/30/08
to Metalink Discussion


On Apr 29, 8:08 pm, "Anthony Bryan" <anthonybr...@gmail.com> wrote:
> On Sun, Apr 27, 2008 at 6:45 AM, Peter Poeml <po...@suse.de> wrote:
> >   % curl -s -H "Accept: foobar,application/metalink+xml,*/*" 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> >  <?xml version="1.0" encoding="UTF-8"?>
>
> >  The reply comes with
> >  Content-Disposition: attachment; filename="GPLv3.txt.metalink"
> >  Content-Type: application/metalink+xml; charset=UTF-8
>
> >  For clients not sending "application/metalink+xml" in the accept header,
> >  the file itself returned.
> >   % curl -s 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> >                     GNU GENERAL PUBLIC LICENSE
>
> this is so nice & useful!
>
> it brings up some issues regarding MIME type.
> "application/metalink+xml" was originally chosen because it follows
> the convention of other XML formats, but this is not officially
> registered. (KDE had an issue with this).
>
> looking athttp://www.iana.org/cgi-bin/mediatypes.pl("Note:
> Registrations in the standards tree must be approved by the IESG and
> must correspond to a formal publication by a recognized standards
> body." "Standards Tree - " (blank)) makes it sound like
> "application/metalink+xml" would not be possible until metalink is
> approved by a standards group. (which I'm not against).
>
> so, we could continue using the unofficial unregistered MIME type
> (strangely, formats like RSS and bittorrent are not registered, at
> least according tohttp://www.iana.org/assignments/media-types/application/) or we could
> register one.
>
> I'd like to register and it looks like we'd be in the Vendor tree, at
> least for now. I don't know we could "upgrade" back to
> application/metalink+xml later if approved by a standards group.
>
> I think application/vnd.metalinker.org+xml would be good.
>
> anyone see possible problems with that?

It might be the right thing to do (standards-wise), however this would
affect existing clients that already sniff for application/metalink
+xml (DownThemAll! does this for example).
You're right that a lot of media types are not actually registered. I
think this is basically because nobody actually cares about the
standards and the official registry, as it simply "works" in the wild,
and futhermore the registration process is kinda clumsy.
However on the long run it would be appreciated if you got metalink to
be a recognized standard/RFC.

On Apr 27, 12:45 pm, Peter Poeml <po...@suse.de> wrote:
> I have changed the download.opensuse.org server to negotiate upon the
> Accept header now. I kept the Accept-Features negotiation in for now,
> but it is scheduled to be removed later. (At the moment there is still
> some documentation referring to it.)
>
> Thus, metalinks can now be negotiated like this:
>
> % curl -s -H "Accept: foobar,application/metalink+xml,*/*" 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> <?xml version="1.0" encoding="UTF-8"?>
>
> The reply comes with
> Content-Disposition: attachment; filename="GPLv3.txt.metalink"
> Content-Type: application/metalink+xml; charset=UTF-8
>
> For clients not sending "application/metalink+xml" in the accept header,
> the file itself returned.
> % curl -s 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> GNU GENERAL PUBLIC LICENSE
>
> Can you try if that works for you / makes sense for you?
>
> Let me know what you think.

Sorry, I didn't find the time to test till now. I'll try it (if it is
still working) and get DownThemAll! trunk to work with it.

Peter Poeml

unread,
Apr 30, 2008, 10:33:30 AM4/30/08
to metalink-...@googlegroups.com
Hi Nils,

On Wed, Apr 30, 2008 at 06:46:41AM -0700, Nils wrote:
> On Apr 27, 12:45 pm, Peter Poeml <po...@suse.de> wrote:
> > I have changed the download.opensuse.org server to negotiate upon the
> > Accept header now. I kept the Accept-Features negotiation in for now,
> > but it is scheduled to be removed later. (At the moment there is still
> > some documentation referring to it.)
> >
> > Thus, metalinks can now be negotiated like this:
> >
> > % curl -s -H "Accept: foobar,application/metalink+xml,*/*" 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> > <?xml version="1.0" encoding="UTF-8"?>
> >
> > The reply comes with
> > Content-Disposition: attachment; filename="GPLv3.txt.metalink"
> > Content-Type: application/metalink+xml; charset=UTF-8
> >
> > For clients not sending "application/metalink+xml" in the accept header,
> > the file itself returned.
> > % curl -s 'http://download.opensuse.org/distribution/10.3/repo/oss/GPLv3.txt'| head -1
> > GNU GENERAL PUBLIC LICENSE
> >
> > Can you try if that works for you / makes sense for you?
> >
> > Let me know what you think.
>
> Sorry, I didn't find the time to test till now. I'll try it (if it is
> still working) and get DownThemAll! trunk to work with it.

It actually didn't work "quite right" yesterday because I broke the
<size> element, which caused clients to refuse the metalink. But I just
fixed it, so it works as it should now. Sorry if you ran into that.

Nils

unread,
Apr 30, 2008, 10:44:05 AM4/30/08
to Metalink Discussion
On Apr 30, 4:33 pm, Peter Poeml <po...@suse.de> wrote:
> Hi Nils,
>
[...]
> > Sorry, I didn't find the time to test till now. I'll try it (if it is
> > still working) and get DownThemAll! trunk to work with it.
>
> It actually didn't work "quite right" yesterday because I broke the
> <size> element, which caused clients to refuse the metalink. But I just
> fixed it, so it works as it should now. Sorry if you ran into that.
>
> Peter

I didn't as we ignore the size element at the moment for the real
download and instead rely on what the server returns...
(just the selection UI will display it).

I have to say adding the required functionality to DownThemAll! was
pretty much straight forward.
This nightly already includes the code, so you may try yourself.
It also should avoid "loops"; that is, it will add application/metalink
+xml to Accept only if the download wasn't initiated from a metalink.

Mentioned nightly (Fx 2.0.0.8+):
http://code.downthemall.net/nightly/downthemall-nightly-20080430.xpi

Changeset:
http://bugs.code.downthemall.net/trac/changeset/958

Anthony Bryan

unread,
Jan 16, 2009, 7:13:31 PM1/16/09
to Metalink Discussion
Eran Hammer-Lahav and Mark Nottingham have informed me that using
transparent content negotiation for serving a "description" of a file,
and not an alternative version (like PNG vs JPG) of the same thing has
been ruled against by the W3C TAG. see
http://esw.w3.org/topic/FindingResourceDescriptions

"Other ways of getting a description through HTTP
* Use content negotiation. If you ask for RDF, you get the
description. If you ask for something else, you get the thing
described. (The TAG, TimBL, and others have pointed out that this
contradicts web architecture, which requires that content negotiation
choose among things that all carry the same information. That goes for
CN between RDF and HTML as much as it does for CN between GIF and
JPEG.)"


the correct, web architecture complient way to do this is apparently
the HTTP Link header:

Link: <http://example.com/resource.metalink>; rel="describedby";
type="application/metalink+xml";

http://tools.ietf.org/html/draft-nottingham-http-link-header-03
http://tools.ietf.org/html/draft-hammer-discovery-01

Eran Hammer-Lahav

unread,
Jan 17, 2009, 1:32:28 AM1/17/09
to Metalink Discussion
The TAG has not officially ruled on this but individual members have
expressed the view that this is not consistent with HTTP. This came up
most recently in their review of Yadis, the discovery protocol used by
OpenID which uses the Accept header to request the descriptor instead
of the resource. There is no easy answer whether this is allowed or
not.

One way to look at this is as views of the same resource. If there was
one representation that is complete (i.e. contains both the file being
downloaded and its metalink information), you can argue that a
metalink-only view is valid, as well as the file itself. But without
such unifying view, it is hard to claim that these are two
representations of the same resource. One is the resource, the other
is 'about' it.

This gets philosophical very quickly...

I have been reviewing this for the past few months for XRD, another
descriptor format (POWDER is yet another). My use case is more complex
than metalink because sometimes there is no resource, just metadata
(abstract resources). I have a few other requirements which led me to
write my discovery spec [2]. However, in the case of metalink, I think
the Link header should be enough for you needs since there is always
an HTTP URI you can perform GET on to find the Link header [1].

The downside is that it adds an extra round trip (one for the header,
another for the metalink document). There is no easy way around it
unless you are willing to use HTTP OPTIONS, etc. I wrote a blog post
[3] about this which is included in my draft. One thing sites with a
lot of downloads can do is use the Site-Meta options in my discovery
draft to declare a URI which can be used to request the metalink
document for other resource URIs.

But again, I think it would be enough for metalink to simply use the
Link header [1].

Let me know if I can help in any way.

EHL

[1] http://tools.ietf.org/html/draft-nottingham-http-link-header-03
[2] http://tools.ietf.org/html/draft-hammer-discovery-01
[3] http://www.hueniverse.com/hueniverse/2008/09/discovery-and-h.html


On Jan 16, 4:13 pm, Anthony Bryan <anthonybr...@gmail.com> wrote:
> Eran Hammer-Lahav and Mark Nottingham have informed me that using
> transparent content negotiation for serving a "description" of a file,
> and not an alternative version (like PNG vs JPG) of the same thing has
> been ruled against by the W3C TAG. seehttp://esw.w3.org/topic/FindingResourceDescriptions
>
> "Other ways of getting a description through HTTP
>     * Use content negotiation. If you ask for RDF, you get the
> description. If you ask for something else, you get the thing
> described. (The TAG, TimBL, and others have pointed out that this
> contradicts web architecture, which requires that content negotiation
> choose among things that all carry the same information. That goes for
> CN between RDF and HTML as much as it does for CN between GIF and
> JPEG.)"
>
> the correct, web architecture complient way to do this is apparently
> the HTTP Link header:
>
> Link: <http://example.com/resource.metalink>; rel="describedby";
> type="application/metalink+xml";
>
> http://tools.ietf.org/html/draft-nottingham-http-link-header-03http://tools.ietf.org/html/draft-hammer-discovery-01

Nils

unread,
Jan 17, 2009, 9:09:50 AM1/17/09
to Metalink Discussion
In my opinion Metalinks are a representation of the actual resource
BECAUSE they are a description of something that actually exists, and
are not merely a link/pointer/reference.
Hence TCN is correctly used here.
And hence there is no problem at all.

And I think a philosophical discussion of "representation vs.
description vs. complete view" in general and of the intension the
authors of 1997 and 1998 RFCs had in particular is non-sense.

There is a valid point for Link:, e.g. for referencing previous
chapters of a document or telling the client "hey, I have an
alternative representation here you didn't think of requesting
yourself you might still be interested in", i.e when you actually want
to reference/point another resource.

When it comes to metalink TCN, the client explicitly has to ask the
server for this representation, and hence there is no problem for non-
metalink clients receiving the fairly degraded (to them at least)
metalink view.
(Maybe this should be clarified in the Metalink spec, especially that
servers should assign a pretty low q=-value to metalink
representations of a resource to avoid sending out metalinks to non-
metalink clients).

Eran Hammer-Lahav

unread,
Jan 17, 2009, 2:10:21 PM1/17/09
to Metalink Discussion

On Jan 17, 6:09 am, Nils <Maier...@web.de> wrote:
> In my opinion Metalinks are a representation of the actual resource
> BECAUSE they are a description of something that actually exists, and
> are not merely a link/pointer/reference.
> Hence TCN is correctly used here.
> And hence there is no problem at all.

This is a bit of a contradiction. Something cannot be a
representation of a resource and a description of it at the same time.
But of course people can (and have) stretched the meaning of
'representation' to be, well, pretty much anything.

> And I think a philosophical discussion of "representation vs.
> description vs. complete view" in general and of the intension the
> authors of 1997 and 1998 RFCs had in particular is non-sense.

I am not sure how 1997/8 are related to this.

> There is a valid point for Link:, e.g. for referencing previous
> chapters of a document or telling the client "hey, I have an
> alternative representation here you didn't think of requesting
> yourself you might still be interested in", i.e when you actually want
> to reference/point another resource.

Link defines a relationship between any two resources. "My Metalink
description" is a valid relationship.

> When it comes to metalink TCN, the client explicitly has to ask the
> server for this representation, and hence there is no problem for non-
> metalink clients receiving the fairly degraded (to them at least)
> metalink view.
> (Maybe this should be clarified in the Metalink spec, especially that
> servers should assign a pretty low q=-value to metalink
> representations of a resource to avoid sending out metalinks to non-
> metalink clients).

Again, there is nothing 'technically' wrong with this approach any
others have taken a similar position that a descriptor can be
collapsed into a single resource URI. But a consensus is building that
this is the wrong way of doing things. What you might want to
consider, since you find the semantic discussion as nonsense (which I
can respect) is the deployment ramifications of using the Accept
header. Many platforms limit access to such headers, some proxies
mishandle Vary headers (which BTW, the spec should require with to any
Accept reply), and some providers will not allow using it on their
servers. You might want to read John Panzer's view of this [1].

You can also support both approaches, allowing providers to accept the
Accept header, but also declare the availability of metalink using the
Link header. Dumb clients can then do a HEAD on the URI if they cannot
use Accpet.

EHL

[1] http://www.abstractioneer.org/2008/11/discovery-metadata-is-just-data.html

Nicolas Alvarez

unread,
Jan 17, 2009, 2:32:43 PM1/17/09
to metalink-...@googlegroups.com
Anthony Bryan wrote:
> Eran Hammer-Lahav and Mark Nottingham have informed me that using
> transparent content negotiation for serving a "description" of a file,
> and not an alternative version (like PNG vs JPG) of the same thing has
> been ruled against by the W3C TAG. see
> http://esw.w3.org/topic/FindingResourceDescriptions

Ugh, first we're told not to use file.iso#!metalink!file.metalink, and now
this...

But on second thought, discouraging this use seems correct in principle...

Nicolas Alvarez

unread,
Jan 17, 2009, 2:35:02 PM1/17/09
to metalink-...@googlegroups.com
Eran Hammer-Lahav wrote:
>> And I think a philosophical discussion of "representation vs.
>> description vs. complete view" in general and of the intension the
>> authors of 1997 and 1998 RFCs had in particular is non-sense.
>
> I am not sure how 1997/8 are related to this.

Those seem to be years, not RFC numbers.


Nils

unread,
Jan 19, 2009, 1:14:30 PM1/19/09
to Metalink Discussion
On Jan 17, 8:10 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
> On Jan 17, 6:09 am, Nils <Maier...@web.de> wrote:
>
> > In my opinion Metalinks are a representation of the actual resource
> > BECAUSE they are a description of something that actually exists, and
> > are not merely a link/pointer/reference.
> > Hence TCN is correctly used here.
> > And hence there is no problem at all.
>
> This is a bit of a contradiction. Something cannot be a
> representation of a resource and a description of it at the same time.
> But of course people can (and have) stretched the meaning of
> 'representation' to be, well, pretty much anything.

A metalink is a representation of a resource, like a JPEG is a
representation.
metalinks may be far more "degraded" and contain pointers to better
version.
A description is always also a representation of the resource, it
usually is just (heavily) degraded.
(My philosophical view)

> > And I think a philosophical discussion of "representation vs.
> > description vs. complete view" in general and of the intension the
> > authors of 1997 and 1998 RFCs had in particular is non-sense.
>
> I am not sure how 1997/8 are related to this.

I meant the years HTTP/1.1 and TCN where published receptively. Sorry
for the misleading wording.
>
> > There is a valid point for Link:, e.g. for referencing previous
> > chapters of a document or telling the client "hey, I have an
> > alternative representation here you didn't think of requesting
> > yourself you might still be interested in", i.e when you actually want
> > to reference/point another resource.
>
> Link defines a relationship between any two resources. "My Metalink
> description" is a valid relationship.
>
I agree. And it might be a good thing to have Link pointers for dumb
clients.
But Link doesn't fulfill the propose/use case of TCN as used by
metalink.
Futhermore it adds more load to the server as it has to (start to)
serve the download in question.

> > When it comes to metalink TCN, the client explicitly has to ask the
> > server for this representation, and hence there is no problem for non-
> > metalink clients receiving the fairly degraded (to them at least)
> > metalink view.
> > (Maybe this should be clarified in the Metalink spec, especially that
> > servers should assign a pretty low q=3D-value to metalink
> > representations of a resource to avoid sending out metalinks to non-
> > metalink clients).
>
> Again, there is nothing 'technically' wrong with this approach any
> others have taken a similar position that a descriptor can be
> collapsed into a single resource URI.
The TAG members argue that our doing basically violates the indent of
the RFCs in question, which makes it technically wrong, no?

> But a consensus is building that
> this is the wrong way of doing things.
Unless there is a right way (and like mentioned above Link doesn't
fulfill the requirements (to reduce server load, especially number of
requests)) such a consensus is useless.
For avoiding any given "wrong way" there has to be a right way to
start with.

> What you might want to
> consider, since you find the semantic discussion as nonsense (which I
> can respect) is the deployment ramifications of using the Accept
> header. Many platforms limit access to such headers.
Such platforms likely limit everything else, too, like a potential
Link header.

> some proxies
> mishandle Vary headers (which BTW, the spec should require with to any
> Accept reply).
So should we now abandon TCN altogether, even for the "blessed" use
cases?

> and some providers will not allow using it on their
> servers.
But there is a guarantee you may use Link headers?

> You might want to read John Panzer's view of this [1].
The only "new" discussion point would be "Extendable - limited to a
single content-type for metadata, and does not allow any existing
schemas (with well known content-type)."
TCN Accept allows to specify multiple types and even associate
preference with those.
For example, DownThemAll! currently sends an Accept like this (Firefox
default + metalink):
Accept: text/html,application/xhtml+xml,application/xml;q=3D0.9,*/
*;q=3D0.8, application/metalink+xml;q=3D0.9
I could imagine other headers like the following would do a great job
too:
Accept: */*;q=3D0.8, application/metalink+xml,application/x-
bittorrent;q=3D0.9
Meaning: Return me a metalink or a torrent if you go some, else return
me what you got instead.

Other than this, he details some great ideas on how to overcome some
potential issues.

>
> You can also support both approaches, allowing providers to accept the
> Accept header, but also declare the availability of metalink using the
> Link header. Dumb clients can then do a HEAD on the URI if they cannot
> use Accpet.

I'm all in favor to adding Link to metalink, but not replacing TCN,
but instead additionally to it.

I also read your overview about different discovery methods [2].
Great overview.

TCN, however, still works best for the given use case, and except for
some "philosophical" issues and implementation/deployment obstacles I
don't see any show-stoppers there.

>
> EHL

Cheers
Nils

>
> [1]http://www.abstractioneer.org/2008/11/discovery-metadata-is-just-
data...

[2] http://www.hueniverse.com/hueniverse/2008/09/discovery-and-h.html

(Note: Resent after mistakenly sending only to Eran)

Anthony Bryan

unread,
Jan 19, 2009, 7:05:02 PM1/19/09
to metalink-...@googlegroups.com
Eran, thanks for joining us!

On Sat, Jan 17, 2009 at 2:10 PM, Eran Hammer-Lahav <er...@hueniverse.com> wrote:
>
> Again, there is nothing 'technically' wrong with this approach any
> others have taken a similar position that a descriptor can be
> collapsed into a single resource URI. But a consensus is building that
> this is the wrong way of doing things. What you might want to
> consider, since you find the semantic discussion as nonsense (which I
> can respect) is the deployment ramifications of using the Accept
> header. Many platforms limit access to such headers, some proxies
> mishandle Vary headers (which BTW, the spec should require with to any
> Accept reply), and some providers will not allow using it on their
> servers. You might want to read John Panzer's view of this [1].

what should the spec require? could you propose some text, I'm not
familiar w/ that.

Eran started a thread about TCN on the HTTP list at
http://lists.w3.org/Archives/Public/ietf-http-wg/2009JanMar/0014.html
(it wouldn't hurt the draft process for metalink people to be involved
on there :) which includes Mark's reply:

"To my knowledge, caching intermediaries haven't deployed it (i.e.,
they'll work with TCN, but they won't be able to serve negotiated
requests from cache... somebody please correct me if I'm wrong).

I'm not sure about browser implementation, but I did a quick check of
the request headers seen by a very high-traffic Web site, and a
vanishingly small number contained the Negotiate header..."

(someone had been working on a metalink plugin for squid).


I figured it wouldn't hurt to quote what we use now & what we could
use in the future from Eran's draft directly:

http://tools.ietf.org/html/draft-hammer-discovery-01

Appendix A.2.1. HTTP Response Header


When a resource representation is retrieved using and HTTP GET
request, the server includes in the response a header pointing to the
location of the descriptor document. For example, POWDER uses the
'Link' response header to create an association between the resource
and its descriptor. XRDS [XRDS] (based on the Yadis protocol
[Yadis]) uses a similar approach, but since the Link header was not
available when Yadis was first drafted, it defines a custom header
X-XRDS-Location which serves a similar but less generic purpose.

[+] Self Declaration - using the Link header, any resource can point
to its descriptor documents.

[-] Direct Descriptor Access - the header is only accessible when
requesting the resource itself via an HTTP GET request. While
HTTP GET is meant to be a safe operation, it is still possible for
some resource to have side-effects.

[+] Web Architecture Compliant - uses the Link header which is an
IETF Internet Standard [[ currently a standard-track draft ]], and
is consistent with HTTP protocol design.

[-] Scale and Technology Agnostic - since discovery accounts for a
small percent of resource requests, the extra Link header is
wasteful. For some hosted servers, access to HTTP headers is
limited and will prevent implementation.

[+] Extensible - the Link header provides built-in extensibility by
allowing new link relationships, mime-types, and other extensions.

Minimum roundtrips to retrieve the resource descriptor: 2

Appendix A.2.2. HTTP Response Header Via HEAD


Same as the HTTP Response Header method but used with an HTTP HEAD
request. The idea of using the HEAD method is to solve the wasteful
overhead of including the Link header in every reply. By limiting
the appearance of the Link header only to HEAD responses, typical GET
requests are not encumbered by the extra bytes.

[+] Self Declaration - Same as the HTTP Response Header method.

[-] Direct Descriptor Access - Same as the HTTP Response Header
method.

[-] Web Architecture Compliant - HTTP HEAD should return the exact
same response as HTTP GET with the sole exception that the
response body is omitted. By adding headers only to the HEAD
response, this solution violates the HTTP protocol and might not
work properly with proxies as they can return the header of the
cached GET request.

[+] Scale and Technology Agnostic - solves the wasted bandwidth
associated with the HTTP Response Header method, but still suffers
from the limitation imposed by requiring access to HTTP headers.

[+] Extensible - Same as the HTTP Response Header method.

Minimum roundtrips to retrieve the resource descriptor: 2

Appendix A.2.3. HTTP Content Negotiation


Using the HTTP Accept request header or Transparent Content
Negotiation as defined in [RFC2295], the consumer informs the server
it is interested in the descriptor and not the resource itself, to
which the server responds with the descriptor document or its
location. In Yadis, the consumer sends an HTTP GET (or HEAD) request
to the resource URI with an Accept header and content-type
application/xrds+xml. This informs the server of the consumer's
discovery interest, which in turn may reply with the descriptor
document itself, redirect to it, or return its location via the
X-XRDS-Location response header.

[-] Self Declaration - does not address as it focuses on the
consumer declaring its intentions.

[+] Direct Descriptor Access - provides a simple method for directly
requesting the descriptor document.

[-] Web Architecture Compliant - while it can be argued that the
descriptor can be considered another representation of the
resource, it is very much external to it. Using the Accept header
to request a separate resource (as opposed to a different
representation of the same resource) violates web architecture.
It also prevents using the discovery content-type as a valid
(self-standing) web resource having its own descriptor.

[-] Scale and Technology Agnostic - requires access to HTTP request
and response headers, as well as the registration of multiple
handlers for the same resource URI based on the Accept header. In
addition, improper use or implementation of the Vary header in
conjunction with the Accept header will cause caches to serve the
descriptor document instead of the resource itself - a great
concern to large providers with frequently visited front-pages.

[-] Extensible - applies an implicit relationship type to the
descriptor mime-type, limiting descriptor formats to a single
purpose. It also prevents using existing mime-types from being
used as a descriptor format.

Minimum roundtrips to retrieve the resource descriptor: 1

Anthony Bryan

unread,
Jan 19, 2009, 7:49:29 PM1/19/09
to metalink-...@googlegroups.com
ps - I forgot to mention that some clients, like Firefox, are not will
to change the Accept header.

unfortunately, this rules out content negotiation, one of the
easiest/coolest features metalink uses.

Nicolas Alvarez

unread,
Jan 19, 2009, 9:28:01 PM1/19/09
to metalink-...@googlegroups.com
Nils wrote:
> Futhermore it adds more load to the server as it has to (start to)
> serve the download in question.

It doesn't if you use HEAD.


Nils

unread,
Feb 5, 2009, 5:00:15 PM2/5/09
to Metalink Discussion
Still you need to do an additional round trip for this. (The HEAD
prior to the GET)
For all downloads, even when not metalink enabled.
Which will break a lot of downloads eventually, namely those magic
"valid only one time" links (see: one click hosters and others).
Not to mention that it will add load to each server.

Let's put this into perspective:
I know that a lot of DownThemAll! users regularly download thousands
of images. Same, of course, applies to users of other download
managers as well.
Using HEAD to discover metalinks will double the number of requests
and hence put strain on both, the users uplink and the servers in
question.

Nils

unread,
Feb 5, 2009, 5:34:19 PM2/5/09
to Metalink Discussion
On Jan 20, 1:05 am, Anthony Bryan <anthonybr...@gmail.com> wrote:
> Eran, thanks for joining us!
>
> On Sat, Jan 17, 2009 at 2:10 PM, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
>
> > Again, there is nothing 'technically' wrong with this approach any
> > others have taken a similar position that a descriptor can be
> > collapsed into a single resource URI. But a consensus is building that
> > this is the wrong way of doing things. What you might want to
> > consider, since you find the semantic discussion as nonsense (which I
> > can respect) is the deployment ramifications of using the Accept
> > header. Many platforms limit access to such headers, some proxies
> > mishandle Vary headers (which BTW, the spec should require with to any
> > Accept reply), and some providers will not allow using it on their
> > servers. You might want to read John Panzer's view of this [1].
>
> what should the spec require? could you propose some text, I'm not
> familiar w/ that.
>
> Eran started a thread about TCN on the HTTP list athttp://lists.w3.org/Archives/Public/ietf-http-wg/2009JanMar/0014.html
> (it wouldn't hurt the draft process for metalink people to be involved
> on there :) which includes Mark's reply:
>
> "To my knowledge, caching intermediaries haven't deployed it (i.e.,
> they'll work with TCN, but they won't be able to serve negotiated
> requests from cache... somebody please correct me if I'm wrong).
>
> I'm not sure about browser implementation, but I did a quick check of
> the request headers seen by a very high-traffic Web site, and a
> vanishingly small number contained the Negotiate header..."

A usual website usually has not much to negotiate...
Likely only image representations, if there are even different
available.

What you will likely see far more often is:
Vary: Accept-Encoding
A lot of sites offer either "plain" or gzip encoding. This is, as far
as I know, relatively widely deployed.
Hence I guess (but have no concrete data to undermine that guess) that
most proxies today in fact have no problem dealing with this.

>
> (someone had been working on a metalink plugin for squid).
>
> I figured it wouldn't hurt to quote what we use now & what we could
> use in the future from Eran's draft directly:
>
> http://tools.ietf.org/html/draft-hammer-discovery-01
>
> Appendix A.2.1. HTTP Response Header
>
>    When a resource representation is retrieved using and HTTP GET
>    request, the server includes in the response a header pointing to the
>    location of the descriptor document.  For example, POWDER uses the
>    'Link' response header to create an association between the resource
>    and its descriptor.  XRDS [XRDS] (based on the Yadis protocol
>    [Yadis]) uses a similar approach, but since the Link header was not
>    available when Yadis was first drafted, it defines a custom header
>    X-XRDS-Location which serves a similar but less generic purpose.
>
>    [+] Self Declaration -  using the Link header, any resource can point
>       to its descriptor documents.

Nice to have, but in my opinion not a strict requirement, at least
from the download manager perspective.

>
>    [-] Direct Descriptor Access -  the header is only accessible when
>       requesting the resource itself via an HTTP GET request.  While
>       HTTP GET is meant to be a safe operation, it is still possible for
>       some resource to have side-effects.
>
>    [+] Web Architecture Compliant -  uses the Link header which is an
>       IETF Internet Standard [[ currently a standard-track draft ]], and
>       is consistent with HTTP protocol design.

It is a draft of a draft of a proposed standard.
Which bears the risk that the draft may change after first
implementations arrive.

>    [-] Scale and Technology Agnostic -  since discovery accounts for a
>       small percent of resource requests, the extra Link header is
>       wasteful.  For some hosted servers, access to HTTP headers is
>       limited and will prevent implementation.
>
>    [+] Extensible -  the Link header provides built-in extensibility by
>       allowing new link relationships, mime-types, and other extensions.

Extensibility is a strong plus.

>
>    Minimum roundtrips to retrieve the resource descriptor: 2

Wasting resources by the additional round trip.

>
> Appendix A.2.2. HTTP Response Header Via HEAD
>
>    Same as the HTTP Response Header method but used with an HTTP HEAD
>    request.  The idea of using the HEAD method is to solve the wasteful
>    overhead of including the Link header in every reply.  By limiting
>    the appearance of the Link header only to HEAD responses, typical GET
>    requests are not encumbered by the extra bytes.
>
>    [+] Self Declaration -  Same as the HTTP Response Header method.
>
>    [-] Direct Descriptor Access -  Same as the HTTP Response Header
>       method.
>
>    [-] Web Architecture Compliant -  HTTP HEAD should return the exact
>       same response as HTTP GET with the sole exception that the
>       response body is omitted.  By adding headers only to the HEAD
>       response, this solution violates the HTTP protocol and might not
>       work properly with proxies as they can return the header of the
>       cached GET request.
>
>    [+] Scale and Technology Agnostic -  solves the wasted bandwidth
>       associated with the HTTP Response Header method, but still suffers
>       from the limitation imposed by requiring access to HTTP headers.

However the second round trip become "mandatory".
When using the Header method and there is no Link header you won't do
a second request.
Here you need to do a second request, always, if you're trying to
discover something.
(The second request either being the GET of the download or the GET of
the metalink, depending on the presence of a metalink).

>
>    [+] Extensible -  Same as the HTTP Response Header method.
>
>    Minimum roundtrips to retrieve the resource descriptor: 2

Actually worse than the Header method. See above.

>
> Appendix A.2.3. HTTP Content Negotiation
>
>    Using the HTTP Accept request header or Transparent Content
>    Negotiation as defined in [RFC2295], the consumer informs the server
>    it is interested in the descriptor and not the resource itself, to
>    which the server responds with the descriptor document or its
>    location.  In Yadis, the consumer sends an HTTP GET (or HEAD) request
>    to the resource URI with an Accept header and content-type
>    application/xrds+xml.  This informs the server of the consumer's
>    discovery interest, which in turn may reply with the descriptor
>    document itself, redirect to it, or return its location via the
>    X-XRDS-Location response header.
>
>    [-] Self Declaration -  does not address as it focuses on the
>       consumer declaring its intentions.

Not strictly required for the current use case.
The Header method could additionally be used if Self Declaration is
wanted.

>
>    [+] Direct Descriptor Access -  provides a simple method for directly
>       requesting the descriptor document.

Biggest plus, which seals the deal for me.

>
>    [-] Web Architecture Compliant -  while it can be argued that the
>       descriptor can be considered another representation of the
>       resource, it is very much external to it.  Using the Accept header
>       to request a separate resource (as opposed to a different
>       representation of the same resource) violates web architecture.

Again, the philosophical dispute if the "descriptor" in our case is a
completely different resource or merely a representation.

>       It also prevents using the discovery content-type as a valid
>       (self-standing) web resource having its own descriptor.

I don't see/get this.
Anyway, I don't think that this affects our use case.

>
>    [-] Scale and Technology Agnostic -  requires access to HTTP request
>       and response headers, as well as the registration of multiple
>       handlers for the same resource URI based on the Accept header.  In
>       addition, improper use or implementation of the Vary header in
>       conjunction with the Accept header will cause caches to serve the
>       descriptor document instead of the resource itself - a great
>       concern to large providers with frequently visited front-pages.

All major implementations at least seem to support Vary well enough
(at least for Accept-Encoding).
Just because there is a risk some implementations are broken shouldn't
hinder us using it.
After all, nobody turned of the web even although clients have lots of
bugs in all major areas, from http/https to html/css/js.
Those few broken implementations need to be corrected, or they will
someday replaced by working competitors anyway.
"Backward compatibility" is always neat to have, but it should not
completely prevent the deployment of new technology, methods or uses.

>
>    [-] Extensible -  applies an implicit relationship type to the
>       descriptor mime-type, limiting descriptor formats to a single
>       purpose.  It also prevents using existing mime-types from being
>       used as a descriptor format.

Valid, but not an issue for our use case.

>
>    Minimum roundtrips to retrieve the resource descriptor: 1
>

That's what I'm looking after.


Nils

Peter Poeml

unread,
Feb 17, 2009, 5:25:42 PM2/17/09
to metalink-...@googlegroups.com
Hi Eran,
hi list,

Well, this is unfortunately missing the point of what we are doing.
Avoiding the additional round-trip is key to put this to use in a
high scalability setting. An additional round-trip is preclusive for me.

Without getting too philosophical, I see no problems in using content
negotiation to achieve this. Of all the options it seems to be the best
and most suitable one, and I neither see a conflict with what it's meant
for.

And actually, doing an extra request to find out whether a Link: to a
metalink could be provided sounds superfluous to me. What's more, when I
do a GET request, I get a response with the resource anyway, so I (as a
client) would rather have to do an extra HEAD request to discover such
Link: headers.

This would seem to me about as useful as, in language variant
negotiation, if a web browser first does a separate request to discover
the variants, and after choosing one does the real request.
This just doesn't fly, would it?

This might not make a noteworthy difference on a smallish server, but in
large-scale environment, a doubling of requests makes a real difference.
For me, this is not just theory. I have to deal with 15-40.000.000
requests per day on one server and I don't want to double those.

And while server load is one thing in these matters -- client response
latency is another. Far away clients would get notably worse response
times when they have to do two requests. The latency of overseas
connections, and even more of those to countries with bad Internet
connectivity, satellite links etc. are always causing painful delays.
And the additional bandwidth used would also not make it better.

An additional Link header would be a good idea, though. I would happily
support this. It could be an interesting option.

Thank you very much for your thoughts and insight!

> Let me know if I can help in any way.
>
> EHL
>
> [1] http://tools.ietf.org/html/draft-nottingham-http-link-header-03
> [2] http://tools.ietf.org/html/draft-hammer-discovery-01
> [3] http://www.hueniverse.com/hueniverse/2008/09/discovery-and-h.html
>
>
> On Jan 16, 4:13 pm, Anthony Bryan <anthonybr...@gmail.com> wrote:
> > Eran Hammer-Lahav and Mark Nottingham have informed me that using
> > transparent content negotiation for serving a "description" of a file,
> > and not an alternative version (like PNG vs JPG) of the same thing has
> > been ruled against by the W3C TAG. seehttp://esw.w3.org/topic/FindingResourceDescriptions
> >
> > "Other ways of getting a description through HTTP
> >     * Use content negotiation. If you ask for RDF, you get the
> > description. If you ask for something else, you get the thing
> > described. (The TAG, TimBL, and others have pointed out that this
> > contradicts web architecture, which requires that content negotiation
> > choose among things that all carry the same information. That goes for
> > CN between RDF and HTML as much as it does for CN between GIF and
> > JPEG.)"
> >
> > the correct, web architecture complient way to do this is apparently
> > the HTTP Link header:
> >
> > Link: <http://example.com/resource.metalink>; rel="describedby";
> > type="application/metalink+xml";
> >
> > http://tools.ietf.org/html/draft-nottingham-http-link-header-03http://tools.ietf.org/html/draft-hammer-discovery-01


Peter
--
Contact: ad...@opensuse.org (a.k.a. ftpa...@suse.com)
#opensuse-mirrors on freenode.net
Info: http://en.opensuse.org/Mirror_Infrastructure

Eran Hammer-Lahav

unread,
Feb 17, 2009, 6:52:19 PM2/17/09
to Metalink Discussion
To get past the "web theory" debate on conflating a resource and its
metadata into one URI, all you have to do is make a case for Metalink
to be a defined as a representation of the resource and not an
external descriptor of it. Do that and you are aligned with most
people in this space. The idea that Metalink is nothing more than an
extremely degraded representation of the file itself is reasonable (to
me).

The right solution here is to support both link-based discovery and
content negotiation. This gives you full coverage of both client and
server capabilities, and allows for optimizations. Links can tell a
client that the site supports Metalink in the first place. Remember
that most sites will not support it and links are a great way to
declare that.

Also, I can envision many cases where the end-user uses a browser to
find its way to a page about a download. Since the browser already got
that HTML page, it can look for either Link header or element and if
present, show a special UI for the user to get a better download
experience (maybe turn the download button into a multi-choice menu).

In addition, /host-meta (the third method described in my discovery
proposal) gives an entire site the ability to declare support for
Metalinks and a way to map from the file URI to the Metalink
information.

So I guess my question is, why not support both links and content
negotiation?

EHL

Peter Poeml

unread,
Feb 17, 2009, 7:00:17 PM2/17/09
to metalink-...@googlegroups.com
Hi!

On Thu, Feb 05, 2009 at 02:34:19PM -0800, Nils wrote:
> On Jan 20, 1:05 am, Anthony Bryan <anthonybr...@gmail.com> wrote:

[...]


> > Eran started a thread about TCN on the HTTP list athttp://lists.w3.org/Archives/Public/ietf-http-wg/2009JanMar/0014.html
> > (it wouldn't hurt the draft process for metalink people to be involved
> > on there :) which includes Mark's reply:
> >
> > "To my knowledge, caching intermediaries haven't deployed it (i.e.,
> > they'll work with TCN, but they won't be able to serve negotiated
> > requests from cache... somebody please correct me if I'm wrong).

One caching intermediary that perfectly supports this is Apache's
disk_cache. As an example, you could load the mirror list from
http://mirrors.opensuse.org/ and you'll get a gzipped version if your
browser indicates so, and a plain version if not. Both are correctly
cached according to the set Expires, and served from the cache. If you
refresh them and cause a cache miss, you'll get a newly compressed, or
newly generated plain, version from the origin server (which is the same
Apache). (This is mostly used locallly, or for taking load of backends.
And it has come a long way; Apache 2.2 required.)

Another one is squid -- even old versions of squid support this just
fine. Squid is much more common as intermediary proxy of course.

There may be (many) other caching intermediary proxies that I am not
aware of and haven't worked with -- I don't know. Anyhow, transparent
negotiation of gzip encoding is so widely used that we can probably
safely assume that it is handled in most cases. Too many highly popular
web sites use it.

And ISPs doing interception caching, and not doing it right in this
regard, would pretty soon be out of business I think.

> > I'm not sure about browser implementation, but I did a quick check of
> > the request headers seen by a very high-traffic Web site, and a
> > vanishingly small number contained the Negotiate header..."
>
> A usual website usually has not much to negotiate...
> Likely only image representations, if there are even different
> available.
>
> What you will likely see far more often is:
> Vary: Accept-Encoding
> A lot of sites offer either "plain" or gzip encoding. This is, as far
> as I know, relatively widely deployed.
> Hence I guess (but have no concrete data to undermine that guess) that
> most proxies today in fact have no problem dealing with this.

It is quite commonplace to add a Vary header Accept-Encoding when
compressing content. It is not as well documented as it should be, but
admins find out about it pretty quickly, usually. Luckily enough, it is
pretty obvious and users notice that they get "garbage" soon, and the
problem is easy enough to find out about, and the solution trivial.

Peter Poeml

unread,
Feb 17, 2009, 8:02:19 PM2/17/09
to metalink-...@googlegroups.com
Hi Eran,

Thank you very much for your educated and detailed thoughts.

Thanks also for being present here on this list; I have subscribed to
the ietf-http-wg list to get a better interconnection by being present
there myself. Also to learn.

I have a bit difficulty seeing the use case for the Link header right
now (although I'm fully supportive of it as another option), because in
my case (metalink generator running on download server), any HTML page
that the user might be looking at with a browser is typically running on
a different server -- often even outside of my control -- which doesn't
support Metalinks itself. I have not digged deeply enough into the Link
header matter to fully understand it, but this seems like a intricacy to
me.

In addition, the fact that the metalink is transparently negotiated
allows for two further particularities -- one not unimportant, the other
crucial:
1) efficiency: for a small resource, let's say 512 bytes in size, it
would be inefficient to construct a metalink for it, or
even HTTP redirect the request, because to return the
resource directly results in about the same amount of
data being transferred to the client as the metalink or
HTTP redirect. The server response might even fit into a
single TCP payload together with the headers.
2) security: the server can decide to return certain resources
directly for security reasons.

There can also be exceptions of other kinds; e.g. one part of the URL
space, or objects of a certain mime type, or objects matching a certain
file pattern, being redirected (HTTP 30x + Location header) to another
server, the latter not being metalink capable.
All these examples are not made up -- I could show you real Apache
config files. :)

This makes it look unfeasible to me to declare metalink capability
globally. Of course, the server could maybe, eventually, do a metalink
for all the resources; but for a HEAD request or to generate a Link
header it would have to run through the full request processing phase to
decide on it.

All in all, the Link header seems to me to be more suitable for related
(other) resources. For instance, I can see a usecase for adding a Link
header, for foobar, to a foobar.md5 and foobar.asc and foobar.sha1 and
foobar.torrent resource. From there it would be a logical step to say
that foobar.metalink would also be such a resource - yes. However, how
many Link headers can I practically add without exceeding the size of a
single TCP packet? It would not be practical as far as I can see. It
makes sense only in selected cases I think. On the contrary, a Metalink
already _is_ a "directory" of such related resources in a way which
encompasses them together in a format that can be handled in an
efficient way. Like a hundred Link headers at a time ;) Looks like a
similar effort, doesn't it? ;)

I'm new to this, and happy to learn more.

Thanks,

Nicolas Alvarez

unread,
Feb 19, 2009, 9:45:47 AM2/19/09
to metalink-...@googlegroups.com
Nils wrote:

Do a GET. If there is no Link, continue downloading from it. If there is a
Link for a .metalink, download the .metalink, parse it, and... you will
probably find the original URL is one of the mirrors. So you never have to
abort the first GET, just keep using it as a download source.


Eran Hammer-Lahav

unread,
Mar 3, 2009, 2:15:21 AM3/3/09
to metalink-...@googlegroups.com
Hi Peter,

I want to reiterate that I am not suggesting replacing the content of a Metalink document with a set of links. Just the opposite. I think Metalink is a perfect example of a resource descriptor with very specific use cases (as opposed to more generic descriptors such as POWDER and XRD). I am also not objecting to the use of content negotiation. It is your prerogative to define Metalink as a valid representation of a file resource. Either way, this is more of a philosophical discussion than a technical one and I think we are past that.

What I am suggesting is that links can offer similar functionality. <LINK> elements are not valid in this case because the HTML page is not likely to be the file being downloaded (unless you consider the HTML page another representation of the same resource, but that is a stretch in this case). This leaves us with HTTP Link headers, which can be obtained from a HEAD request on the file URI, and Link-Pattern records in the site's /host-meta file.

The last option is interesting because it allows an entire download server to declare how to obtain a Metalink for any file within its authority (host). Yes, if you only download one file, you need two round trips. But if you are downloading multiple files from the same server, you can cache the /host-meta information and go directly to the Metalink descriptor for any given download.

One more think that links can offer you is the ability to support Metalink of shared hosting environments where the service is not likely to give you access to content negotiation configuration on the server. But if you can drop a /host-meta file, you can bypass that and still support Metalink. But this of course requires that the spec tells client to look for such links if the Accept header approach fails.

Now, the design requirements for the link framework I'm proposing are much more restrictive than what I am assuming you are using for Metalink. For example:

1. Clients can be assumed to have full access to the full HTTP feature set including content negotiation.
2. There is no such thing as a Metalink of a Metalink file (i.e. second derivative).
3. Metalink is not useful as a format for anything else. It is always associated with a file which is the primary focus.
4. File servers are willing to support content negotiation for file downloads.

None of these are possible for my own use cases. For example, Yahoo! will not allow using content negotiation on key properties such as the front page, but I still need to associate a descriptor to it. Yes, it is unlikely that Y!'s front page will even support or benefit from Metalink. Also, many of the platforms I need to support such as Javascript, Flash, and old versions of PHP will not allow easy access to clients to some HTTP features. And many hosting services will not allow users to setup content negotiation for their files.

So my suggestion is for you to keep what you have, and consider if the value of what I am proposing is worth including as a secondary discovery mechanism.

EHL

Reply all
Reply to author
Forward
0 new messages