Thanks Bram,
Notes in the text below.
On 1/28/2013 7:08 PM, Bram Neijt wrote:
> Hi Eliezer,
>
> I'm not yet clear on how Squid would come to know about the metalink.
> Will an admin add the metalink to a list, so you have a list of
> supported/trusted metalinks, or will squid detect the download of a
> metalink and do something with it? Because I was in the mood, I've
> added a section on both scenarios.
This is negotiable since metalinks are not in use in squid yet or in any
competitive forward proxy I know of.
Squid should come to know of an exact metalink file from the headers of
the download or a download redirection(301\302).
> ==== If Squid detects a metalink download and tries to do something smart
> I don't really see a way of having metalinks interact with the mirror
> path identifier/store-id features. The big problem, I think, is that
> you have to be sure the client requesting any of the metalink urls
> actually has the URL and will verify the integrity of the download
> afterwards. Otherwise the client could just be visiting one of the
> urls mentioned in a metalink that happened to pass through the proxy
> and get the wrong data.
Downloading a metalink should never ever ever result the proxy
downloading the files\urls.
This is one of the big security holes that can be opened.(I pray for the
sake of this proxy writer life)
> One thing I could think of was having squid detect the URLS in a
> metalink file as probably almost static and up the cache time
> regardless of what the real HTTP server hosting the file would respond
> in the future.
This is similar to one of my ideas of using the metalink file from a
download header to build up a small metalink based DB which will use the
first match url from the metalink file as the store-id for this file\object.
For that all mirrors should have a header pointing to the metalink
file... of the download.
> ==== If Squid trusts the metalink content, for example it was added by an admin
> Then squid could use the urls to generate a regex for the store-id
> extraction and you could have Squid consider the different urls as
> equal using your plugin. You might need to add some functionality to
> your plugin (for example use a search replace regex instead of "group
> 1 determines the store id", which I gathered from your example).
Squid or any software that will read the metalink file\content should be
reliable or else wont be used at all.
The feature(plugin) is an interface to squid which allows other software
do all the other needed math such as regex etc.
The logic I am seeking is either for a helper that will do what needed
before the download starts or squid will do it internally.
In order to use a regex of any kind I need to know what to match for..
This is were I want to use metalinks.
MirrorBrain makes life more simple with that since I can know about a
list of mirrors with the same exact content so I can add a list of
mirrors to one store-id match.
This is not really a direct metalink feature though.
Maybe there is an option to add metalinks an option that can allow more
then just like:
<url
type="http">
http://mirror.aptus.co.tz/pub/ubuntu/12.10/ubuntu-12.10-desktop-i386.iso</url>
but a more plural way to define it for static CDN networks such as:
<url type="http" domain="
mirror.aptus.co.tz"
base_path="pub/ubuntu/">ubuntu/12.10/ubuntu-12.10-desktop-i386.iso</url>
or any other form..
I know for example that fedora uses a specific path structure.
this is the local mirror here in israel:
http://mirror.isoc.org.il/pub/fedora/releases/17/Fedora/x86_64/iso/Fedora-17-x86_64-netinst.iso
When I download from:
http://download.fedoraproject.org/pub/fedora/linux/releases/17/Fedora/x86_64/iso/Fedora-17-x86_64-netinst.iso
I get redirected to my local mirror which has if you see a specific path
structure that can be described in a much simpler way then just writing
the whole url in the metlink file.
If the CDN network is being managed by one node and the metalinks being
compiled with a simple algorithm why do not present it to the client the
same simple way the main node see that?
In any way most metalinks are dynamic by AS or Country code in a big CDN
network.
I also remember that a path can be used in the filename so smaller file
with less unneeded strings in it.
Maybe there can be another file format then metalink for this specific
purpose what do you think?
> Good luck with the plugin!
>
> Bram
>
Thanks,
Eliezer