Updated Metalink plugin for Apache Traffic Server

46 views
Skip to first unread message

Jack Bates

unread,
Feb 5, 2014, 6:33:42 PM2/5/14
to metalink-...@googlegroups.com
I just pushed an update to the Metalink plugin for Apache Traffic server [1]
The update fixes a segfault that was reported by Faysal Banna [2]

[1]  https://cwiki.apache.org/confluence/display/TS/Metalink
[2]  https://github.com/jablko/dedup/issues/1

I pushed the updated plugin to GitHub (hopefully it will also be distributed with the next Traffic Server release).
To build it,

   1) download the updated metalink.cc file [3],
   2) replace the plugins/experimental/metalink/metalink.cc file in your Traffic Server source tree with the updated file,
   3) and rebuild Traffic Server by rerunning "make".

[3]  https://raw.github.com/jablko/dedup/master/metalink.cc

Here is a real example of what this plugin does:
If I download the latest version of LibreOffice, their download redirector (MirrorBrain) sends me to tdf.mirror.rafal.ca
But if several users are all sitting behind Traffic Server and someone has already downloaded this file from mirror.nexcess.net, the response from MirrorBrain is rewritten to redirect me to that mirror instead. Then I will get the file that Traffic Server cached, instead of going out over our internet connection.

Here is the response I get from MirrorBrain:

$ curl -v download.documentfoundation.org/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz > /dev/null

< HTTP/1.1 302 Found
< Link: <http://tdf.mirror.rafal.ca/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz>; rel=duplicate; pri=1; geo=ca
< Link: <http://mirror.nexcess.net/tdf/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz>; rel=duplicate; pri=2; geo=us
< Digest: SHA-256=YVJGdJtB7E2kxPTBjLPBwd4zhlgiDclzqBUTWyvzGkk=
< Location: http://tdf.mirror.rafal.ca/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz

But if I cause the proxy to cache the file from mirror.nexcess.net ...

$ curl -vx localhost:8080 mirror.nexcess.net/tdf/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz > /dev/null

... then the proxy will rewrite the response from MirrorBrain:

$ curl -vx localhost:8080 download.documentfoundation.org/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz > /dev/null

< HTTP/1.1 302 Found
< Location: HTTP://mirror.nexcess.net/tdf/libreoffice/stable/4.2.0/rpm/x86/LibreOffice_4.2.0_Linux_x86_rpm.tar.gz

And I will get the file that the proxy cached.
I hope other people will find this useful.

Anthony Bryan

unread,
Feb 7, 2014, 3:57:59 PM2/7/14
to Metalink Discussion
awesome, thanks Jack!

it's very nice of you to keep this updated, working, & also committed
back to ATS.

btw, the README is great. I'm going to include some of it here:


Metalink

Try not to download the same file twice. Improve cache efficiency
and speed up downloads.

Take standard headers and knowledge about objects in the cache and
potentially rewrite those headers so that a client will use a URL
that's already cached instead of one that isn't. The headers are
specified in [RFC 6429] (Metalink/HTTP: Mirrors and Hashes) and
[RFC 3230] (Instance Digests in HTTP) and are sent by various
download redirectors or content distribution networks.


1. Who Cares?

More important than saving a little bit of bandwidth, this saves
users from frustration.

A lot of download sites distribute the same files from many
different mirrors and users don't know which mirrors are already
cached. These sites often present users with a simple download
button, but the button doesn't predictably access the same mirror,
or a mirror that's already cached. To users it seems like the
download works sometimes (takes seconds) and not others (takes
hours), which is frustrating.

An extreme example of this happens when users share a limited,
possibly unreliable internet connection, as is common in parts of
Africa for example.

[How to cache openSUSE repositories with Squid] is another,
different example of a use case where picking a URL that's already
cached is valuable.

2. What it Does

When it sees a response with a "Location: ..." header and a
"Digest: SHA-256=..." header, it checks if the URL in the Location
header is already cached. If it isn't, then it tries to find a URL
that is cached to use instead. It looks in the cache for some
object that matches the digest in the Digest header and if it
succeeds, then it rewites the Location header with the URL from
that object.

This way a client should get sent to a URL that's already cached
and won't download the file again.
> --
> You received this message because you are subscribed to the Google Groups
> "Metalink Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to metalink-discus...@googlegroups.com.
> To post to this group, send email to metalink-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/metalink-discussion.
> For more options, visit https://groups.google.com/groups/opt_out.



--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
)) Easier, More Reliable, Self Healing Downloads
Reply all
Reply to author
Forward
0 new messages