Mirror URIs & MetaMirror Servers

5 views
Skip to first unread message

#1 $tunna

unread,
Aug 3, 2007, 5:56:18 AM8/3/07
to Metalink Discussion
hello all,

I am the author of the "Mirror URIs" draft document[1] which Metalink
parallels greatly. the original document was released publicly in
July of 2002.

archives can be viewed via the WayBack Machine & the p2p-hackers
mailing list:

[1] Mirror URIs - a URL Scheme for Handling Data Mirrors (WayBack
Machine)
http://web.archive.org/web/20051231182617/http://www.shouldexist.org/?op=displaystory;sid=2002/7/12/53049/1427;mode=mo

[1] Subject: [p2p-hackers] Mirror URIs
http://notabug.com/zest/p2p-hackers/msg00046.html


below is a brief brainstorm about the Metalink project & suggestions
to extend / improve it:


the initial idea was to use XML files & also a new URI scheme.
I also planned an open source database server w/ a web UI similar in
functionality to ShareURL.com & FileMirrors.com. this project was
codenamed Refl3xion[2]


[2] WayBack Machine Archive - Refl3xion (Initial Idea)
http://web.archive.org/web/20030131030432/sourceforge.net/docman/display_doc.php?docid=11336&group_id=53762

(I have seen Bouncer & MetaMirrors.NL, both look promising.)


fast forward 5 years later & I am ecstatic about the support that this
project currently has, however I think for Metalinks to take off in
the mainstream, Metalink usage should be as unobtrusive & transparent
as possible to the end-user.

how can this be acheived?

my proposal: a new URI scheme & metalink database servers

I am proposing the usage of the following URI scheme:

mirror://<hashtype>=<hash>
mirror://filename&<hashtype>=<hash>
mirror://filename&<hashtype>=<hash>&ml=<metalink_file_URL>

examples:

mirror://sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&ml=http://server.tld/example_data_file.metalink

a client-side handler will interact w/ MetaMirrors servers to locate
additional mirrors.
there should be options on the client end to specify MetaMirror server
IPs / hostnames manually or auto-update them from a web source,
MetaMirror server, or Metalink file.

mirror URIs & Refl3xion / MetaMirror servers will work in concert.
when an end-user clicks a mirror URI a mirror URI handler on the
client will search a MetaMirror server for the corresponding data hash
& return all mirrors back to the client. additionally, such clients
should be able to submit mirror URLs w/ corresponding hashes to
MetaMirror servers.
this type of setup is only feasible if there is no single point of
failure for MetaMirror servers. i.e, there are numerous servers
available across the web.

additionally, there should be a mechanism in place for MetaMirror
servers to share data between each other (mirror lists sync)

servers should have mechanisms to police themselves
i.e. shared blacklists that will identify servers & clients by IP that
send malicious or invalid data on a consistent basis. I vision a
scheme similar to the one implemented in Vipul's Razor[3]

[3] http://razor.sourceforge.net/

I recommend an option on the client side to create ad-hoc p2p swarms
(cooperative downloading)
ATM, the only method I have thought of to acheive this is a server
option to cache previous download requests & offer the list to clients
that support cooperative downloading

this may raise privacy & legal issues for some, but if anyone knows a
better way to offer this option please respond

once again, these will be options: not the default behavior of client
or server
also, this is option represents an additional area where the Razor-
style spam / invalid hosts list comes into play

I think the combination of mirror URIs & a large number of Refl3xion /
MetaMirror servers will give the Metalink project a great push into
mainstream usage. I strongly believe the convergence of Metalinks,
BitTorrent, & BitTorrent WebSeeding are the future of high speed
distribution.

if these three technologies could interoperate I believe it offers
almost limitless potential.

BitTorrent WebSeeding & Metalinks are currently in their infancy, but
they offer great options for content providers, especially if they
could be used in concert.

after reading about the Open Search specification, I thought of a
method for browsers to auto-discover data mirrors (see basic idea
below)

Auto-discovery of mirrors (useful for implementations on YouTube-style
sites, etc)
(will require browser plugin to recognize the mirrors & handle multi-
source streaming)

a web page that offers download mirrors can advertise it so that
browsers can access the mirrors transparently to the end-user.

To support autodiscovery, the following line should be added to the
<head> section of a web page:

<link rel="mirrors" type="application/metamirrors+xml"
href="metalinkURL">


metalinkURL
The URL to the .metalink file which lists additonal mirrors the
browser can download from


I have a few questions about BitTorrent & Metalinks:

how are torrents handled by Metalink files?
how are BitTorrent info hashes handled within Metalink files?
how are multi-file torrents handled?
does Metalink support multiple files (i.e. directories)?
is there currently a torrent client that supports Metalinks?
(I have read a little on aria2, but not in-depth...)

I have an idea to commission a proof-of-concept implementation using
Azureus (since it is open source & popular)

it will entail the following tasks:

1. mirror URI support
2. BitTorrent WebSeed support (GetRight spec[4]) - including creation
of WebSeed torrents (does not exist AFAIK)
3. Metalink file support (HTTP & BitTorrent only)

[4] http://www.getright.com/seedtorrent.html

I am not a Java coder, so I will hire someone for that project.
if anyone here is available for the position please let me know.

the Rel3xion project will be re-opened on SourceForge in the near
future.
the codebase will be released under the GPL.
I invite any developers here to join the project if you are available.

feedback & insight is needed.
please advise.
thank you.

-$tunna
The War Room
(http://TheWarRoom.info)

Anthony Bryan

unread,
Aug 4, 2007, 6:06:50 AM8/4/07
to metalink-...@googlegroups.com
Hi #1 $tunna,

On 8/3/07, #1 $tunna <stu...@gmail.com> wrote:
>
> hello all,
>
> I am the author of the "Mirror URIs" draft document[1] which Metalink
> parallels greatly. the original document was released publicly in
> July of 2002.
>
> archives can be viewed via the WayBack Machine & the p2p-hackers
> mailing list:
>
> [1] Mirror URIs - a URL Scheme for Handling Data Mirrors (WayBack
> Machine)
> http://web.archive.org/web/20051231182617/http://www.shouldexist.org/?op=displaystory;sid=2002/7/12/53049/1427;mode=mo
>
> [1] Subject: [p2p-hackers] Mirror URIs
> http://notabug.com/zest/p2p-hackers/msg00046.html

Nice!

>
> below is a brief brainstorm about the Metalink project & suggestions
> to extend / improve it:
>
>
> the initial idea was to use XML files & also a new URI scheme.
> I also planned an open source database server w/ a web UI similar in
> functionality to ShareURL.com & FileMirrors.com. this project was
> codenamed Refl3xion[2]
>
>
> [2] WayBack Machine Archive - Refl3xion (Initial Idea)
> http://web.archive.org/web/20030131030432/sourceforge.net/docman/display_doc.php?docid=11336&group_id=53762
>
> (I have seen Bouncer & MetaMirrors.NL, both look promising.)
>
>
> fast forward 5 years later & I am ecstatic about the support that this
> project currently has, however I think for Metalinks to take off in
> the mainstream, Metalink usage should be as unobtrusive & transparent
> as possible to the end-user.
>
> how can this be acheived?
>
> my proposal: a new URI scheme & metalink database servers
>
> I am proposing the usage of the following URI scheme:
>
> mirror://<hashtype>=<hash>
> mirror://filename&<hashtype>=<hash>
> mirror://filename&<hashtype>=<hash>&ml=<metalink_file_URL>
>
> examples:
>
> mirror://sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
> mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
> mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&ml=http://server.tld/example_data_file.metalink

Thanks for sharing your ideas! I'm not sure a new URI scheme will help
make metalink more mainstream/transparent, but if that's what's
required then cool :). Refl3xion is a great code name btw!

> a client-side handler will interact w/ MetaMirrors servers to locate
> additional mirrors.
> there should be options on the client end to specify MetaMirror server
> IPs / hostnames manually or auto-update them from a web source,
> MetaMirror server, or Metalink file.

Sounds good. Metalink clients should already be checking the "origin"
URL for updated mirrors/info for "dynamic" metalinks.

> mirror URIs & Refl3xion / MetaMirror servers will work in concert.
> when an end-user clicks a mirror URI a mirror URI handler on the
> client will search a MetaMirror server for the corresponding data hash
> & return all mirrors back to the client. additionally, such clients
> should be able to submit mirror URLs w/ corresponding hashes to
> MetaMirror servers.
> this type of setup is only feasible if there is no single point of
> failure for MetaMirror servers. i.e, there are numerous servers
> available across the web.
>
> additionally, there should be a mechanism in place for MetaMirror
> servers to share data between each other (mirror lists sync)

That would be great if there were redundant MetaMirror servers and
they could coordinate w/ each other.

> servers should have mechanisms to police themselves
> i.e. shared blacklists that will identify servers & clients by IP that
> send malicious or invalid data on a consistent basis. I vision a
> scheme similar to the one implemented in Vipul's Razor[3]
>
> [3] http://razor.sourceforge.net/
>
> I recommend an option on the client side to create ad-hoc p2p swarms
> (cooperative downloading)
> ATM, the only method I have thought of to acheive this is a server
> option to cache previous download requests & offer the list to clients
> that support cooperative downloading

Yes, this is planned eventually.

Yes, this is planned for autodiscovery, but not implemented in clients yet.

> I have a few questions about BitTorrent & Metalinks:
>
> how are torrents handled by Metalink files?

Either as a <url type="bittorrent"> (supported) or with the torrent
info the XML (unsupported so far AFAIK).

> how are BitTorrent info hashes handled within Metalink files?

Chunk/piece/segment checksums? The same way they're handled for other
files in metalinks.

> how are multi-file torrents handled?

Probably not very well in practice, but each file can be listed in the
metalink with it's own link to the torrent.

> does Metalink support multiple files (i.e. directories)?

Yes, multiple files (all clients?) with directories (only aria2) are supported,

> is there currently a torrent client that supports Metalinks?
> (I have read a little on aria2, but not in-depth...)

GetRight works the best, it integrates torrent/ftp/http. aria2 also
supports torrents, but separately I believe. KGet will also eventually
integrate torrent/ftp/http. Celerius also plans to do that.

> I have an idea to commission a proof-of-concept implementation using
> Azureus (since it is open source & popular)
>
> it will entail the following tasks:
>
> 1. mirror URI support
> 2. BitTorrent WebSeed support (GetRight spec[4]) - including creation
> of WebSeed torrents (does not exist AFAIK)
> 3. Metalink file support (HTTP & BitTorrent only)
>
> [4] http://www.getright.com/seedtorrent.html

That sounds great. I believe Az supports #2, except maybe not creation
of them (unless thats what you mean).

> I am not a Java coder, so I will hire someone for that project.
> if anyone here is available for the position please let me know.

I don't know if he is available, by Bram Neijt might be able to work on that.

> the Rel3xion project will be re-opened on SourceForge in the near
> future.
> the codebase will be released under the GPL.
> I invite any developers here to join the project if you are available.
>
> feedback & insight is needed.
> please advise.
> thank you.

Cool, this all seems like great stuff!

I'd only like to add, more people/projects need to hear about metalink
because many don't. Anyone who's distributing large files and wants
error correction should be using it. I also think some more general
download sites need to use it for it to catch on. You only need to
download ISOs and OO.org so often :) The client support is there.
People can use DTA! w/ Firefox, but I'd also really like to see
support in Opera & Safari.

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
)) Easier, More Reliable, Self Healing Downloads

#1 $tunna

unread,
Aug 6, 2007, 8:46:31 AM8/6/07
to Metalink Discussion
> how are BitTorrent info hashes handled within Metalink files?

> Chunk/piece/segment checksums? The same way they're handled for other
> files in metalinks.

when a torrent is uploaded to a tracker the torrent has an info hash.
this info hash is the same on all trackers if the torrent has not been
modified.
(if I am not mistaken)

if this is the case, an info hash could be used to identify identical
torrents.


> how are multi-file torrents handled?

> Probably not very well in practice, but each file can be listed in the
> metalink with it's own link to the torrent.

each file within a torrent with it's own metalink?
the actual .torrent file contains all pertinent information about it's
corresponding data.
why not use that information?


Aren Olson

unread,
Aug 6, 2007, 3:53:50 PM8/6/07
to metalink-...@googlegroups.com

It can. As I understand it, metalink can either use an external
torrent file, or embed the torrent info into the metalink. Multiple
files are handled in either case.

The new download manager I'm working on now, Celerius, will eventually
have full support for both bittorrent and metalink (among other
things), but it may take a while to get there. If your new URI scheme
ever gets anywhere I'll add it too.

ttyl,
Aren

#1 $tunna

unread,
Aug 7, 2007, 1:02:06 PM8/7/07
to Metalink Discussion
example:

a torrent contains 20 top level directories (not subdirectories of a
root folder)
how is this handled within a Metalink file?

there are currently two paid developers working on mirror URIs &
Refl3xion ATM.
I was hoping to get feedback from the community.....

Nicolas

unread,
Sep 21, 2007, 9:30:26 PM9/21/07
to Metalink Discussion
On Aug 3, 6:56 am, #1 $tunna <stu...@gmail.com> wrote:
> I am proposing the usage of the following URI scheme:
>
> mirror://<hashtype>=<hash>
> mirror://filename&<hashtype>=<hash>
> mirror://filename&<hashtype>=<hash>&ml=<metalink_file_URL>
>
> examples:
>
> mirror://sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
> mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
> mirror://example_data_file.txt&sha1=YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&ml=http://server.tld/example_data_file.metalink

I see no need to add yet another URI scheme incompatible with
everything existing. Magnet links already identify things like that:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

(interesting that the sample magnet link shown everywhere has the same
hash as your samples)

Anyway, I prefer the way metalink does it with anchors:
http://mirror1/shiny-file.tar.bz2#!metalink3!http://mainsite/shiny-file.metalink

This is backwards-compatible with normal links. If the browser or some
extension supports it, it will download the .metalink and download
from many mirrors and do hash checking and all those other cool things
metalinks give us. Otherwise, it will just download from the normal
URL, stripping the anchor (all browsers do that). Using a new URI
scheme, user would get an error if he doesn't have any program that
can handle it. Which is more transparent and unobtrusive?

Anthony Bryan

unread,
Sep 22, 2007, 6:28:46 PM9/22/07
to metalink-...@googlegroups.com
On 9/21/07, Nicolas <nicolas...@gmail.com> wrote:
> Anyway, I prefer the way metalink does it with anchors:
> http://mirror1/shiny-file.tar.bz2#!metalink3!http://mainsite/shiny-file.metalink
>
> This is backwards-compatible with normal links. If the browser or some
> extension supports it, it will download the .metalink and download
> from many mirrors and do hash checking and all those other cool things
> metalinks give us. Otherwise, it will just download from the normal
> URL, stripping the anchor (all browsers do that). Using a new URI
> scheme, user would get an error if he doesn't have any program that
> can handle it. Which is more transparent and unobtrusive?

unfortunately (I think it was a really cool way to do things)
#!metalink3! appended to URLs has been removed from the spec. it met
with general disapproval on the IETF HTTP list (they were about to
lynch me, it's a problem w/ # the fragment identifier).

this is also why link fingerprints [1] support will not be included in
Firefox 3.

http://example.org/somefile#hash(md5:b04abf1a9a3af8cfff32b330681fbcec)

--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
)) Easier, More Reliable, Self Healing Downloads

[1] http://bugs.code.downthemall.net/trac/wiki/LinkFingerprints

Nicolas

unread,
Sep 22, 2007, 8:07:38 PM9/22/07
to Metalink Discussion
On Sep 22, 7:28 pm, "Anthony Bryan" <anthonybr...@gmail.com> wrote:
> unfortunately (I think it was a really cool way to do things)
> #!metalink3! appended to URLs has been removed from the spec. it met
> with general disapproval on the IETF HTTP list (they were about to
> lynch me, it's a problem w/ # the fragment identifier).

Do you have a link to the IETF comments? I would like to see the
reasons why they didn't like the idea.

Other ways:
<ideas-pulled-out-of-butt>

- A custom attribute in the <a> tag. Advantages: just as simple to
setup as the #!metalink3 system for the webmaster. Disadvantages:
makes the HTML page invalid.

- A custom HTTP header. Client makes a request for the normal download
URL using HEAD instead of GET, and reads the headers. Advantages:
Erm... Disadvantages: very complex for the server admin/webmaster to
setup, and how would a client know it has a metalink? It would have to
do the HEAD request for *all* downloads! Bad idea.

- Have the browser send application/metalink+xml on the Accept request
header, or maybe a custom header. Configure server to return a
metalink instead of the actual data if this header is there. If user
clicks a link, and it returns raw data (decided based on the mime type
as it would usually do), it would ask to download as usual. If the
mime type is application/metalink+xml, pass it along to the metalink
system instead. Advantages: same link, no need to make any second
request, Disadvantages: might be complex for the server admin/
webmaster to setup, particularly bad if the webmaster can't run server-
side scripts.

- Add a <link rel="metalink-list" href="..."> tag to the HTML page. If
it exists, have the browser (or extension) request via this URL which
of the links on the page contain metalinks, and replace them on the
fly, or just store the list internally and look it up when the user
clicks a link. The URL would be a static XML file containing a mapping
between raw data URL and metalink URL. Returning the metalinks
themselves inline on the XML means less requests (good for latency)
but bigger XML (bad for bandwidth). We could give both alternatives.
Advantages: Well... lacks all the disadvantages of the rest of the
ideas, easy to implement, no need for server-side scripts (of course a
script *can* be used but it's not required), HTML stays valid, no
latency problems for extra requests. If a website uses lots of
metalinks, it can just split the list, since the <link> is per page,
not per website. I doubt anybody would have hundreds of metalinks on
the same *page*. Disadvantages: Yet another file to load, but I doubt
it matters, so many images and stuff to load that one more XML won't
make speed difference. Yet another file to keep up-to-date on the
server. It would be better to keep the metalink close to the link
itself, instead of a URL-based mapping.

</ideas-pulled-out-of-butt>

I wrote them up while I thought of them. Last one seems to be the best
way. All seem better for the user than a custom URI scheme (they all
work if the browser doesn't have metalink support). Post comments
while I go cool my brain.

#1 $tunna

unread,
Sep 23, 2007, 7:52:14 AM9/23/07
to Metalink Discussion
Nicolas wrote on Sat, 22 Sep 2007 01:30:26 -0000:

> I see no need to add yet another URI scheme incompatible with
> everything existing. Magnet links already identify things like that:
> magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

this will be implemented as an option, not a requirement.
I will try to reply to your other statements when I get time to read
them thoroughly.
(I just did a quick scan....)
I have posted about the "link rel" idea in this group also.

> (interesting that the sample magnet link shown everywhere has the same
> hash as your samples)

yes, I used the hash from the magnet URI sample page.

Reply all
Reply to author
Forward
0 new messages