Exciting for me indeed :-)
Now some questions to you guys, where I would appreciate little hints:
> Things to do:
>
> - implement the rfc 3339 dates (still looking if Apache/libapr has
> infrastructure for that)
Do the clients only parse the rfc 3339 date, or do they also grok
something like
pubdate="Thu, 03 Apr 2008 09:05:25 GMT"
(more an rfc 822 example) or
pubdate="Mon Feb 4 14:43:24 2008"
?
The latter is taken from a real example:
http://download.packages.ro/metalink/openoffice/OOo_1_0_2_SolarisIntel_install_tar_gz.metalink
Or put the other way round, can all clients be expected to parse the
"2006-05-15T00:01Z" format from the spec?
I implemented it like this now:
% date
Thu Apr 3 16:33:47 CEST 2008
% curl -s 'http://localhost/zrkadlo/distribution/10.3/repo/oss/GPLv3.txt?clientip=202.72.191.2&metalink' | grep date
pubdate="2008-04-03T14:33Z"
refreshdate="2008-04-03T14:33Z">
This is meant to be GMT (I'm in the GMT+2 timezone).
Does that look right?
> - play with the score values, possibly put the scores of mirrors from
> the same country (or same region) higher?
> - Find out how clients deal with location and preference, if both are
> present, and which they give the higher weight
> - spec says, preference is a value of 1 to 100, while we use
> arbitrary positive integers
> - does the order or <url> elements in the metalink matter?
What do you think about this? What our redirector constructs is a list
of mirrors sorted by:
- region
- country
- weighted randomized preference
in the way it can be seen here:
http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-x86_64.iso?mirrorlist
Do you have ideas how I would map those onto the metalink scheme?
Assuming that order of <url> elements doesn't matter, I could, for
instance, add 200 to the scores of mirrors from the same country, and
add 100 to the mirrors from the same region -- or something like that --
to give them the high enough score that they deserve, if they are
geographically near.
> - can location=.. be upper case as well as lower case?
does someone know? I'm considering to switch to all lowercase in our database,
because it is less awkward to type (when administering the mirror database from
the commandline, for instance). If it doesn't matter for metalink clients, it
would be good to know.
It would also be good if the metalink spec would make these things
explicit :-)
Thanks a lot!
> What do you think about this? What our redirector constructs is a list
> of mirrors sorted by:
>
> - region
> - country
> - weighted randomized preference
>
> in the way it can be seen here:
> http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-x86_64.iso?mirrorlist
>
> Do you have ideas how I would map those onto the metalink scheme?
> Assuming that order of <url> elements doesn't matter, I could, for
> instance, add 200 to the scores of mirrors from the same country, and
> add 100 to the mirrors from the same region -- or something like that --
> to give them the high enough score that they deserve, if they are
> geographically near.
Geo McFly - "On the fly generator of metalinks based on their
geographical location." by Per.
http://sourceforge.net/projects/geomcfly
http://metalink-discussion.googlegroups.com/web/geoloc_database-v1.00alpha.txt
classifies countries
(from nearest to farest from...)
HTH ?
For geomcfly I started at preference 100 (which IIRC is the max value in
metalink specification, I originally gave them preference in % of proximity,
but since metalink specification specifies that preference value should be
int and not float that was obviously not feasible) for the closest location,
then 99 for the next etc. all the way down to 0 which will be given as
preference for all the remaining mirrors.
Mirrors at same location is given the same priority.
An example is provided at http://pastebin.ca/969690 (yes, mind the bug that
makes it -2 in stead of -1 for next mirror after mirrors with same
preference;p).
For the preference of mirrors, the distance between mirrors were calculated
using the mirrors latitude and longitude coordinates.
I think this is not only simpler solution than the geoloc database and
similar, but also a lot more elegant.
Another improvement could be to ensure that mirrors within same country is
given higher priority than mirrors in other countries that's closer in term
of geographic location since bandwith in local country is usually better.
You might find other useful ideas as well in geomcfly. :)
another idea i have purposed is to use a ratio on this % which:
- increases biggest BW mirror server and decrease smallest of them
- increases mirror which is on a well known "internet country" (as
France, USA, Netherlands, Germany, Japan, Singapore...) and decrease
mirror which is located on small countries as Monaco, portugal...)
- increases real distro OS mirror server and decrease FTP OS distros
main servers of them
On Thu, Apr 03, 2008 at 06:52:55PM +0200, Sebastien WILLEMIJNS wrote:
> Geo McFly - "On the fly generator of metalinks based on their
> geographical location." by Per.
> http://sourceforge.net/projects/geomcfly
Thanks everybody for the suggestion --
Right now are so many things happening, that I have a hard time catching
up :-) I appreciate very much that you pointed me to it.
I shortly looked at it last night.
As far as I understand it, it is employed on the server side. This means
that the server needs a database which resolves IP addresses into
geographical coordinates, right? The free GeoIP database doesn't do
that, to my knowledge. The version that is available for a yearly
(monthly?) fee surely does, but that's not available to us so far, for
budget reasons.
I have looked around for alternatives. The one that came closest was a
350MB sized free MySQL database. The rsync server which hosted it ceased
to host it. However, I just checked and the site is still there, and
there are other rsync servers now: http://hostip.info
(Has anyone of you worked with data from hostip.info?)
The thing with all such databases is, that they become significantly
larger than the simple country database is. Thus, they could put some
limitation on the scalability of the server.
Also, even though there are some databases for that, their correctness
seems to vary quite a bit.
Anyway, for reasons of scalability, I decided (so far) that all
sophisticated work should be done on the client side.
What the server _can_ do, without any scalability issues, provide the
geographical coordinates of the mirrors. Thus, the client has the
opportunity to make intelligent use of them. (Is there any such facility
in the client libraries? I guess no.)
> http://metalink-discussion.googlegroups.com/web/geoloc_database-v1.00alpha.txt
> classifies countries
> (from nearest to farest from...)
Interesting!
Is any client using that?
Or some metalink server?
Does geomcfly use it?
right now, they should grok RFC 822 but we plan to move to RFC 3339
with the release of the new spec. in reality, I think most clients
don't make use of the date info.
I pointed you towards the draft spec because we plan to release it
relatively soon. I hope that didn't cause too much confusion.
what do you think about the different date formats?
I'm interested to hear what you find out about Apache/libapr and RFC 3339
> The latter is taken from a real example:
> http://download.packages.ro/metalink/openoffice/OOo_1_0_2_SolarisIntel_install_tar_gz.metalink
>
> Or put the other way round, can all clients be expected to parse the
> "2006-05-15T00:01Z" format from the spec?
right now, no clients parse RFC 3339. but if/when we switch over,
we'll try to switch all the metalink generators over at once.
> I implemented it like this now:
>
> % date
> Thu Apr 3 16:33:47 CEST 2008
> % curl -s 'http://localhost/zrkadlo/distribution/10.3/repo/oss/GPLv3.txt?clientip=202.72.191.2&metalink' | grep date
> pubdate="2008-04-03T14:33Z"
> refreshdate="2008-04-03T14:33Z">
>
> This is meant to be GMT (I'm in the GMT+2 timezone).
>
> Does that look right?
looks good!
> > - play with the score values, possibly put the scores of mirrors from
> > the same country (or same region) higher?
> > - Find out how clients deal with location and preference, if both are
> > present, and which they give the higher weight
> > - spec says, preference is a value of 1 to 100, while we use
> > arbitrary positive integers
> > - does the order or <url> elements in the metalink matter?
>
> What do you think about this? What our redirector constructs is a list
> of mirrors sorted by:
>
> - region
> - country
> - weighted randomized preference
>
> in the way it can be seen here:
> http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-x86_64.iso?mirrorlist
>
> Do you have ideas how I would map those onto the metalink scheme?
> Assuming that order of <url> elements doesn't matter, I could, for
> instance, add 200 to the scores of mirrors from the same country, and
> add 100 to the mirrors from the same region -- or something like that --
> to give them the high enough score that they deserve, if they are
> geographically near.
the order of <url> elements doesn't matter, except for clients like
GetRight which can only handle 16 mirrors for each file.
the preference aka priority scores go from 100 to 1, with 100 meaning
use first. multiple URLs can have the same value.
not all download clients know where the location of the downloader, so
preference will probably dictate which mirrors are used first.
here's a copy & paste from another message on some ideas for this:
A client should use location and preference to determine the best
resources to use.
Assuming location and preference are given in the metalink:
If the client has no locale or location set, it should use preference.
If the client has a locale or location set, it should use local
mirrors first, starting with those with a higher preference (that is
"de" and "100" if in Germany, then "de" and "99", etc).
Nils also provided this algorithm for download clients:
p = url.getPreference();
if (url.getLocation() == myLocation) {
p = 100 + p;
}
assuming:
u1,de, 90
u2,de,100
u3,us,100
would then (sorted) calculated as:
u2,de,200
u1,de,190
u3,us,100
(assuming you're german ;))
> > - can location=.. be upper case as well as lower case?
>
> does someone know? I'm considering to switch to all lowercase in our database,
> because it is less awkward to type (when administering the mirror database from
> the commandline, for instance). If it doesn't matter for metalink clients, it
> would be good to know.
>
> It would also be good if the metalink spec would make these things
> explicit :-)
oops :) it can be upper or lower case. I'm adding that to the spec.
now, this could get confusing but I'll try to reply to your earlier
message. I've snipped a few things I've already answered, or don't
know enough about Apache to give a good response :)
On Wed, Apr 2, 2008 at 5:19 PM, Dr. Peter Poeml <po...@suse.de> wrote:
> On Fri, Mar 28, 2008 at 07:38:30PM -0400, Anthony Bryan wrote:
> > On Thu, Mar 27, 2008 at 9:58 AM, Dr. Peter Poeml <po...@suse.de> wrote:
> > > > Now that KGet supports metalinks, I think some really interesting
> > > > things can be done. Like passing info to Nepomuk or signatures to
> > > > Kgpg. Any ideas?
> > >
> > > Hm, I don't know KGet (I hardly know and use download clients at all), I
> > > don't have ideas about that.
> >
> > I just mean that starting with openSUSE 11 and onward, there will be a
> > built in metalink downloader available to everyone. no need to install
> > any new software to make use of it :)
>
> Hm, as long as the support is only in KDE, it's of no use on the command
> line or on servers. Or on Gnome desktops.
>
> I think what would massivly help would be a metalink-enabled libcurl :-)
there's aria2 or metalink checker (python) for command line usage and
wxDownload Fast for Gnome, along with Celerius in development. there's
also metadl a NSIS Windows plugin that uses libcurl.
a metalink-enabled libcurl or wget would be so very helpful. if you
contacted Daniel Stenberg on one of the curl lists, it might encourage
him. odd that curl uses metalinks on their download page, but doesn't
support it in their download app! :) we can only hope.
> > Yes, metalinks CAN contain checksums, but it is not a requirement,
> > just a recommendation.
> >
> > So, if file verification is not important or causes scaling issues
> > then you can disregard the checksums.
>
> Cool! That means, I could just serve ad-hoc metalinks without checksums
> if no "checksum file' is present on-disk. If there is a "checksum file"
> on disk, the redirector could include its content into the metalink.
> Would be very useful for iso images.
yep, great!
> > > For large files, like disk images, it would not really work to generate
> > > the checksums on the fly, but of course they could be prepared and put
> > > on disk beforehand.
> >
> > Yes, and I would recommend chunk checksums as well so the large files
> > can be repaired. This is what other implementations do, store the
> > whole file and chunk checksums and add the mirror lists on the fly.
>
> Yep. That's what I just thinking (one paragraph above :-)
>
> For the myriad of small files we can just ditch the checksums. They are
> not so important, because all files are integrity-checked anyway, by
> repository metadata (albeit later).
true, it's not worth it for those small files then.
> * For other clients (users with browsers or download programs) it is
> similar I guess. Of course those could be expected to "click" on a
> special metalink, instead of the normal downlaod link. However, if it
> works transparently, it would be better, wouldn't it?
>
> Therefore, it seems like a brilliant idea (of you) to make clients
> indicate with a header that they understand metalinks. I'm not firm
> enough with the Accept and Accept-* headers to say which one is the
> best/correct one, but I'm sure we'll find one...
I'm not familiar enough with them either, but I agree we can figure it
out in time
> Do you think it is realistic to get all existing metalink-capable
> downlod clients to add that header (over time)? There doesn't seem to be
> any obstacle to me which would make that difficult.
no, I think it would be pretty easy.
I was talking to someone from Firefox tho, and they apparently try not
to bloat the header. but that can be dealt with in the future.
> Is that mimetype "official"?
no, the mimetype is not official.
> > Another idea that groups like OpenOffice.org have been asking about is
> > a statistics/report server. The benefit would be you have a tracker,
> > like a torrent tracker, that keeps up with download stats over a whole
> > mirror network, reports errors, etc. We haven't had the manpower to
> > attack that yet, but if you have any ideas tell me!
>
> Oh, I think we have something pretty good for that.
>
> Last year, I first wrote the redirector apache module. After that I
> wrote a statistics counting module. It takes the URL paths and splits
> them by a specific patterns, and translates those bits into an SQL query
> which increases counters. I kept that separate because _that_ module
> is, contrary to the redirector, specific to openSUSE because of the
> particular URL path parsing (it groks subpackages, version numbers and
> release numbers).
>
> However! that is of course adaptable to other schemes. It is probably
> the most performant way of logging such things anyway. And it naturally
> hooks into the logging phase of Apache's request processing. And it uses
> the same infrastructure.
>
> Maybe I should write something like a howto, or a prototype of how to
> parse other URLs, or even make it more generic. Or at least give
> pointers how to do that :-)
>
> https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-stats/mod_stats/
> http://en.opensuse.org/Build_Service/Redirector#Other_tidbits
>
> Might be interesting for OpenOffice.org.
that sounds very cool!
On Wed, Apr 2, 2008 at 8:28 PM, Dr. Peter Poeml <po...@suse.de> wrote:
>
> I have a proof of concept. The redirector can create metalink replies
> now. I'd say, it looks great :-)
> Thank you so much for getting me started on this!
gosh, thanks for working on it!
> Things to do:
>
> - implement the rfc 3339 dates (still looking if Apache/libapr has
> infrastructure for that)
again, please let us know if it does. that's important information.
> - charset=UTF-8 is probably wrong in our case.
what should it be?
--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
)) Easier, More Reliable, Self Healing Downloads
On Thu, Apr 03, 2008 at 08:33:26PM +0200, Per Øyvind Karlsen wrote:
>
> På Torsdag 03 april 2008 , 18:52:55 skrev Sebastien WILLEMIJNS:
> > > Do you have ideas how I would map those onto the metalink scheme?
> > > Assuming that order of <url> elements doesn't matter, I could, for
> > > instance, add 200 to the scores of mirrors from the same country, and
> > > add 100 to the mirrors from the same region -- or something like that --
> > > to give them the high enough score that they deserve, if they are
> > > geographically near.
> >
> > Geo McFly - "On the fly generator of metalinks based on their
> > geographical location." by Per.
> > http://sourceforge.net/projects/geomcfly
>
> For geomcfly I started at preference 100 (which IIRC is the max value in
> metalink specification, I originally gave them preference in % of proximity,
> but since metalink specification specifies that preference value should be
> int and not float that was obviously not feasible) for the closest location,
> then 99 for the next etc. all the way down to 0 which will be given as
> preference for all the remaining mirrors.
Thanks for the hint. I figured something similar, too. Starting with 100
and decrementing. That will assume that
- the scores that I have internally are used to create the _order_ (with
the same weighted randomization as before), and
- the order is used to generate the metalink preferences.
That makes sense!
> Mirrors at same location is given the same priority.
Ah, okay. That probably makes sense also -- although in my case it might
not matter, because the redirector has already randomized the list for
the client and thereby "chosen" a (first to try) mirror for it.
But does it make a difference for the metalink client? How will it react
if all mirrors have different preferences? It will depend on the client
configuration, I guess, how many connections to open -- or would that
stop the client from parallel downloading?
Do I, anyhow, assume correctly, that it doesn't matter to the client if
it gets three mirrors with preferences 100, 10, 1, or if they are 100,
99, 98? Is that the same?
> An example is provided at http://pastebin.ca/969690 (yes, mind the bug that
> makes it -2 in stead of -1 for next mirror after mirrors with same
> preference;p).
Thank you for the example!
> For the preference of mirrors, the distance between mirrors were calculated
> using the mirrors latitude and longitude coordinates.
> I think this is not only simpler solution than the geoloc database and
> similar, but also a lot more elegant.
Does that actually mean that you don't take the client's geographical
coordinates into account!?
Heck. I need to think about that for a minute.
Could you explain a bit more please?
> Another improvement could be to ensure that mirrors within same country is
> given higher priority than mirrors in other countries that's closer in term
> of geographic location since bandwith in local country is usually better.
Hm. That, on the other hand, is what I'm doing right now :-) The
client's own country is the strongest selector for mirrors.
And I have some exceptions in place for that: Australian mirrors are
considered like "same country" for New Zealand clients, for instance.
> You might find other useful ideas as well in geomcfly. :)
I'm sure.
Two considerations:
* I'll need to do all in C, for scalability reasons.
* I'm thinking that the work calculating distances based on
coordinates could best be done in the client. After all, it could be
easy to configure the client with its actual coordinates (although
those can vary if the client is roaming). It would scale best if the
client does that work. That's why I intended to "ship" the mirror
coordinates in my home-baked mirror list which I planned. And the client
_could_ use them, but wouldn't be required to do so.
I feel at home in this forum. Finally :-)
Interesting ideas indeed.
It brings me back to the question I just asked -- whether the
preferences can be used to "protect" a small mirror from too much
traffic:
Considering the following two possibilities, given there are 3 mirrors:
a)
"big" mirror with preference=100
"big" mirror with preference=99
small mirror with preference=98 <----
b)
"big" mirror with preference=100
"big" mirror with preference=99
small mirror with preference=5 <----
Would b) make sure that the small mirror really gets less requests than
in a) ?
The thing is, right now our redirector can control this very well --
because it sends only redirects to single mirrors. And it'll choose the
small mirrors less often.
I want to keep that ability. I think it is vital to get many mirrors,
also small ones (which can be fast, they just have smaller disks and
are not supposed to serve substantially more than, say, 1 TB per month).
If there is a risk that the clients will try e.g. all mirrors to find
the fastest one, than it might mean that they also try the small ones in
the end of the list, and overload them.
(One way to prevent that could be to simply _not_ include those in the
metalink every time.)
Peter
> > http://metalink-discussion.googlegroups.com/web/geoloc_database-v1.00alpha.txt
> > classifies countries (from nearest to farest from...)
>
> Interesting!
>
> Is any client using that?
> Or some metalink server?
> Does geomcfly use it?
no for all this questions... they have some small geolocations errors on
this DB but i can launch again the batch to create it when a project
will really use this database...
> Does that actually mean that you don't take the client's geographical
> coordinates into account!?
IP address gives the country with help of a ip2country db
> I feel at home in this forum. Finally :-)
welcome on it, we love to meet big project staff to permit to our (best)
ideas to be (quickly) used for the community ;)
> > another idea i have purposed is to use a ratio on this % which:
> > - increases biggest BW mirror server and decrease smallest of them
> > - increases mirror which is on a well known "internet country" (as
> > France, USA, Netherlands, Germany, Japan, Singapore...) and decrease
> > mirror which is located on small countries as Monaco, portugal...)
> > - increases real distro OS mirror server and decrease FTP OS distros
> > main servers of them
>
> Interesting ideas indeed.
>
> It brings me back to the question I just asked -- whether the
> preferences can be used to "protect" a small mirror from too much
> traffic:
>
> Considering the following two possibilities, given there are 3 mirrors:
>
> a)
>
> "big" mirror with preference=100
> "big" mirror with preference=99
> small mirror with preference=98 <----
>
> b)
>
> "big" mirror with preference=100
> "big" mirror with preference=99
> small mirror with preference=5 <----
>
> Would b) make sure that the small mirror really gets less requests than
> in a) ?
>
every number between 98 and 5 gives same result ;)
> The thing is, right now our redirector can control this very well --
> because it sends only redirects to single mirrors. And it'll choose the
> small mirrors less often.
sorry i do not understand
> I want to keep that ability. I think it is vital to get many mirrors,
> also small ones (which can be fast, they just have smaller disks and
> are not supposed to serve substantially more than, say, 1 TB per month).
>
> If there is a risk that the clients will try e.g. all mirrors to find
> the fastest one, than it might mean that they also try the small ones in
> the end of the list, and overload them.
>
> (One way to prevent that could be to simply _not_ include those in the
> metalink every time.)
IMHO if the hazard?/hasard? is used on one or several step of the
metalink build, it is not necessary to find best method to improve
metalink ;)
On Thu, Apr 03, 2008 at 05:21:57PM -0400, Anthony Bryan wrote:
>
> On Thu, Apr 3, 2008 at 10:36 AM, Dr. Peter Poeml <po...@suse.de> wrote:
> > On Wed, Apr 02, 2008 at 11:12:40PM -0700, Ant Bryan wrote:
> > > VERY exciting to hear tonight that Peter Poeml has added metalink
> > > replies to the openSUSE redirector!
> >
> > Exciting for me indeed :-)
> >
> > Now some questions to you guys, where I would appreciate little hints:
> >
> >
> > > Things to do:
> > >
> > > - implement the rfc 3339 dates (still looking if Apache/libapr has
> > > infrastructure for that)
> >
> > Do the clients only parse the rfc 3339 date, or do they also grok
> > something like
> > pubdate="Thu, 03 Apr 2008 09:05:25 GMT"
> > (more an rfc 822 example) or
> > pubdate="Mon Feb 4 14:43:24 2008"
> > ?
>
> right now, they should grok RFC 822 but we plan to move to RFC 3339
> with the release of the new spec. in reality, I think most clients
> don't make use of the date info.
>
> I pointed you towards the draft spec because we plan to release it
> relatively soon. I hope that didn't cause too much confusion.
Ah, sorry, yes, I was somehow taking it as if it was set in stone. My
bad.
> what do you think about the different date formats?
>
> I'm interested to hear what you find out about Apache/libapr and RFC 3339
Regarding implementation in Apache, RFC 822 date format is easiest to do, like
this:
/* current time in RFC 822 format */
char *time_str = apr_palloc(r->pool, APR_RFC822_DATE_LEN);
apr_rfc822_date(time_str, apr_time_now());
ap_rprintf(r, "pubdate=\"%s\"\n", time_str);
RFC 3339 involves some more incantation:
/* put the current time into rfc 3339 date format */
char time_str[MAX_STRING_LEN];
apr_time_exp_t tm;
apr_time_exp_gmt(&tm, apr_time_now());
apr_strftime(time_str, &len, MAX_STRING_LEN, "%Y-%m-%dT%H:%MZ", &tm);
That's what I came up with at least. A bit of search showed that RFC
3339 format isn't used anywhere else in Apache/libapr yet.
I don't feel like a real expert on these formats. The bit I'm never
quite sure about is timezones. Presumably, the time is expected to be in
GMT, right?
Given that in HTTP there is already the RFC 822 format very prevalent,
servers and clients alike most probably have functions for that on board
already. So it might be simpler to stick to that. (My point of view is
pretty HTTP centric I admit)
>
> > The latter is taken from a real example:
> > http://download.packages.ro/metalink/openoffice/OOo_1_0_2_SolarisIntel_install_tar_gz.metalink
> >
> > Or put the other way round, can all clients be expected to parse the
> > "2006-05-15T00:01Z" format from the spec?
>
> right now, no clients parse RFC 3339. but if/when we switch over,
> we'll try to switch all the metalink generators over at once.
Good to know.
[ Other confirmation, and additional helpful information snipped ]
Thanks for the algorithm depicting how the clients operate, it helps me
to understand how all this works.
> A client should use location and preference to determine the best
> resources to use.
> Assuming location and preference are given in the metalink:
>
> If the client has no locale or location set, it should use preference.
>
> If the client has a locale or location set, it should use local
> mirrors first, starting with those with a higher preference (that is
> "de" and "100" if in Germany, then "de" and "99", etc).
>
> Nils also provided this algorithm for download clients:
>
> p = url.getPreference();
> if (url.getLocation() == myLocation) {
> p = 100 + p;
> }
>
> assuming:
> u1,de, 90
> u2,de,100
> u3,us,100
> would then (sorted) calculated as:
> u2,de,200
> u1,de,190
> u3,us,100
> (assuming you're german ;))
>
>
>
I see.
> > I think what would massivly help would be a metalink-enabled libcurl :-)
>
> there's aria2 or metalink checker (python) for command line usage and
> wxDownload Fast for Gnome, along with Celerius in development. there's
> also metadl a NSIS Windows plugin that uses libcurl.
>
> a metalink-enabled libcurl or wget would be so very helpful. if you
> contacted Daniel Stenberg on one of the curl lists, it might encourage
> him. odd that curl uses metalinks on their download page, but doesn't
> support it in their download app! :) we can only hope.
Indeed, that would be great. In particular because YaST/zypper/libzypp,
the openSUSE download client uses libcurl for *everything*. :-)
> > Cool! That means, I could just serve ad-hoc metalinks without checksums
> > if no "checksum file' is present on-disk. If there is a "checksum file"
> > on disk, the redirector could include its content into the metalink.
> > Would be very useful for iso images.
>
> yep, great!
About the on-disk checksum snippets to be included when generating
metalinks, I'm considering to store them out of tree. That because
noboty else than the metalink-generating redirector needs to see them,
because they are not complete metalinks anyway. Also it would only
confuse, and make rsync modules for mirror feeding more complicated,
because I would excluding anyway. Also, I won't need to be careful that
they are not overwritten by incoming rsync jobs.
A simple mtime comparison should suffice to make sure that the checksums
are up to date I guess.
> > * For other clients (users with browsers or download programs) it is
> > similar I guess. Of course those could be expected to "click" on a
> > special metalink, instead of the normal downlaod link. However, if it
> > works transparently, it would be better, wouldn't it?
> >
> > Therefore, it seems like a brilliant idea (of you) to make clients
> > indicate with a header that they understand metalinks. I'm not firm
> > enough with the Accept and Accept-* headers to say which one is the
> > best/correct one, but I'm sure we'll find one...
>
> I'm not familiar enough with them either, but I agree we can figure it
> out in time
>
> > Do you think it is realistic to get all existing metalink-capable
> > downlod clients to add that header (over time)? There doesn't seem to be
> > any obstacle to me which would make that difficult.
>
> no, I think it would be pretty easy.
>
> I was talking to someone from Firefox tho, and they apparently try not
> to bloat the header. but that can be dealt with in the future.
Right now, aria2c sends only:
GET /foobar HTTP/1.0
Host: localhost
Accept-Encoding: gzip
Not even a user agent :-)
We need to be careful -- browsers can send
Accept: text/*, image/*, application/*
and application/* would also match "our" MIME type.
Content-codings are expected to be transforming the content withoug loss
of information, and not replacing it with something "different".
Transfer-coding similarly.
So maybe a new header is the easiest and most appropriate option.
However, I have one other idea.
A normal "redirect" involves a 302 Found or 307 Temporary Redirect
reply, a Location header with the URL, and a body should follow.
Normally, the body contains a human-readable, "verbose" redirection
information, which is mostly ignored, and not even seen by humans,
because 302's and 307's are normally followed transparently.
The body could actually contain the metalink.
That would mean that the reply is a normal, standard-conformant redirect
which is acceptable for every client -- but a metalink-enabled client
could use the metalink from the body.
Just an idea. It might be a bit insane. But sounds doable.
> > - charset=UTF-8 is probably wrong in our case.
>
> what should it be?
Hm, unsure. (It's like timezones ;) We probably don't have non-ascii
characters in filenames on our server, so it wouldn't matter anyway. But
UTF-8 would be absolutely alright then, too.
Main thing is, that the character set *is* declared, to prevent XSS
attacks (and insert only trusted input, or escape meticulously, you
know...)
I'll stick to UTF-8. :-)
Thanks,
I just noticed some things:
1)
The spec draft speaks about the new rfc 3339 format, but it mentions the
old format in Appendix B:
</xs:attribute>
<xs:attribute name="pubdate" type="RFC822DateTime"/>
2)
The shown examples don't show seconds, only minutes as finest
resolution. Is that really sufficient? Well, it might be, I'm just not
sure about it.
3)
With Apache, there is r->request_time attached to the request object,
which can be used instead of apr_time_now(), which is already filled out
and can save the syscall to time().
For now, '?metalink' needs to be appended, an ugly kludge, but it should
be sufficient to try out the new feature.
You can simply browse http://download.opensuse.org/distribution/ or
http://download.opensuse.org/repositories/ and append the URL to any
file you like. Be aware however that some files are returned directly,
because they are not meant to be redirected.
Example:
http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-i386.iso?metalink
Ah, and there is a way to "simulate" the metalink that a client from a
different IP address would see, by appending clientip=xxx.xxx.xxx.xxx to
the query string.
For now, there are no checksums. The feature to inject the checksums
from an existing on-disk file (*.metalink-hashes) is already
implemented. However, I would like to have those files out-of-tree, and
I will first implement support for that. Once that is done, I'll
generate the checksum files for the large files, and then the metalinks
will include them.
I implemented the preference values like we discussed, starting with 100
and decrementing. I do allocate every number only once, though. That's
because the list of mirrors is already sorted by the weighted randomized
scheme.
I notice that aria2c seems to download only from one mirror. Which might
not be entirely as it is intended.
I would appreciate your comments - and test results, should you try it
out!
> I notice that aria2c seems to download only from one mirror. Which might
> not be entirely as it is intended.
can you give us the metalink for this case ?
I tried it like this:
% aria2c -s 2 -j 2 -M 'http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-i386.iso?metalink'
[#1 SIZE:0B/4,201.7MiB(0%) CN:1 SPD:0.00KiB/s] [FileAlloc:#1 733.2MiB/4,201.7MiB(17%)]
and there is a single open connection to, for instance, 134.60.1.5
(ftp.rz.uni-ulm.de):
# netstat -tupan | grep aria
tcp 73844 0 83.133.126.38:46980 134.60.1.5:80 ESTABLISHED 19819/aria2c
I have to apologize for not carefully checking!
aria2c actually opened more connections -- later:
root@doozer ~ # netstat -tupan | gi aria
tcp 34712 0 83.133.126.38:46988 134.60.1.5:80 ESTABLISHED 19819/aria2c
tcp 101200 0 83.133.126.38:33566 134.60.1.5:80 ESTABLISHED 19819/aria2c
tcp 17376 0 83.133.126.38:44715 134.76.12.5:80 ESTABLISHED 19819/aria2c
tcp 147240 0 83.133.126.38:44716 134.76.12.5:80 ESTABLISHED 19819/aria2c
tcp 86424 0 83.133.126.38:39643 129.143.116.10:80 ESTABLISHED 19819/aria2c
Sorry for the noise.
Is there a way to make aria2c more verbose?
I think I now understand the "CN:5" that it shows. Shouldn't have looked
more carefully.
Thanks,
plz wait end of file allocation ;) the only TCP connection was only FOR
THE MOMENT caused to grab filesize
> aria2c actually opened more connections -- later:
after file allocation (see my last message) ;)
> Is there a way to make aria2c more verbose?
-log or -l (too lazy to see man pages;)
nice!
can you make it so the file returned is filename.metalink
the extension .metalink is what triggers some clients.
> Ah, and there is a way to "simulate" the metalink that a client from a
> different IP address would see, by appending clientip=xxx.xxx.xxx.xxx to
> the query string.
>
> For now, there are no checksums. The feature to inject the checksums
> from an existing on-disk file (*.metalink-hashes) is already
> implemented. However, I would like to have those files out-of-tree, and
> I will first implement support for that. Once that is done, I'll
> generate the checksum files for the large files, and then the metalinks
> will include them.
>
> I implemented the preference values like we discussed, starting with 100
> and decrementing. I do allocate every number only once, though. That's
> because the list of mirrors is already sorted by the weighted randomized
> scheme.
some very cool features. seems like this is progressing quickly!
> aria2c actually opened more connections -- later:
if you want to limit this:
aria2 should respect "maxconnections" in the metalink, which you could
have defined for each server (to perhaps limit to 1 connection for
each server) or for the whole download (set to 1 if you wanted a
sequential download from 1 server, with no segments).
> Sorry for the noise.
that's not noise!
I think following this development in a totally public forum may be a
good guide to anyone who comes after. I don't think we had a mailing
list when other apps were worked on, so in my opinion this is great,
showing the issues & ideas that come up...
> Is there a way to make aria2c more verbose?
as Sebastien said, -l logfile is the only option I know of. (most
people want aria2 to be quiet!)
I probably have more to write on these matters, but I gotta run now!
I'll do that, once I find some time to work on it. Hopefully soon :-)
The question is, though, is there a way to achieve that, even if the
request was on "filename" (without ".metalink" extension)? The only idea
that comes to my mind would be a "Content-Disposition" header with
"filename.metalink" in it. Would that be a / the way to do it?
I also plan to move away from the "append ?metalink" way to retrieve the
metalink. For example, I could simply treat a request for
filename.metalink as a request to generate a metalink for filename. That
would solve the above issue, because the client will get exactly what it
requests (and trigger on it). However, this would mean (for me) that I
would need some other pages containing links with .metalink appended,
effectively pointing to the generated metalinks.
So what I'm really for is the transparent way to achieve that, which we
already discussed about!
Great -- I didn't know that, apparently I managed to miss that when I
looked around.
> > > http://metalink-discussion.googlegroups.com/web/geoloc_database-v1.00alph
> > >a.txt classifies countries
> > > (from nearest to farest from...)
> >
> > Interesting!
> >
> > Is any client using that?
> > Or some metalink server?
> > Does geomcfly use it?
> geomcfly only uses the free geoip database.
> If you want to test it, I can probably make some better instructions for it.
> Currently geomcfly isn't entirely ready for real world usage and it might not
> ever be since alternate ways will probably be used for Mandriva Linux which
> it were designed for. I suspect that your implementation willl be far better
> anyways and a better alternative, I'd love to help out in any way I can on
> your project though, I've spent quite some time on the topic myself and
> should probably have some usable input for you. :)
Thanks, we'll talk more about it soon, I'm sure!
On Thu, Apr 10, 2008 at 05:43:16PM +0200, Dr. Peter Poeml wrote:
> > can you make it so the file returned is filename.metalink
> >
> > the extension .metalink is what triggers some clients.
>
> I'll do that, once I find some time to work on it. Hopefully soon :-)
>
> The question is, though, is there a way to achieve that, even if the
> request was on "filename" (without ".metalink" extension)? The only idea
> that comes to my mind would be a "Content-Disposition" header with
> "filename.metalink" in it. Would that be a / the way to do it?
An RFC 2183 Content-disposition header was indeed a way to solve it.
Now aria2c parses and uses the metalink right away:
% aria2c 'http://localhost/zrkadlo/distribution/10.3/repo/oss/GPLv3.txt?metalink'
[#1 SIZE:7.3KiB/7.3KiB(100%) CN:0]
2008-04-19 16:41:05 NOTICE - Download complete: ./GPLv3.txt.metalink
[#2 SIZE:34.3KiB/34.3KiB(100%) CN:4] [Checksum:#2 0B/34.3KiB(0%)]
2008-04-19 16:41:07 NOTICE - Verification finished successfully. file=./GPLv3.txt
2008-04-19 16:41:07 NOTICE - Download complete: ./GPLv3.txt
Download Results:
(OK):download completed.(ERR):error occurred.(INPR):download in-progress.
gid|stat|path/URI
===+====+======================================================================
1| OK|./GPLv3.txt.metalink
2| OK|./GPLv3.txt
On Fri, Apr 04, 2008 at 07:30:26PM +0200, Dr. Peter Poeml wrote:
> ...up and running.
>
> For now, '?metalink' needs to be appended, an ugly kludge, but it should
> be sufficient to try out the new feature.
Update:
The above mentioned kludge with appending the query string "?metalink"
can be regarded obsolete now. Well it still works, I didn't disable it
yet, because there is some documentation out there about it, but I
implemented two other ways to request a metalink today:
1) by, more naturally, appending ".metalink" to the filename
2) by adding application/metalink+xml into the Accept request header
Example for 1) would be:
http://download.opensuse.org/distribution/10.3/iso/dvd/openSUSE-10.3-GM-DVD-i386.iso.metalink
Best,