Coordinating Hackage mirroring procedures

44 views
Skip to first unread message

Michael Snoyman

unread,
Sep 16, 2016, 3:51:14 AM9/16/16
to Haskell-community, stac...@googlegroups.com, Duncan Coutts, Herbert Valerio Riedel, Peter Simons
Hi all,

Duncan, Herbert and I have been having a conversation the past few days about Hackage mirrors, kicked off by [1]. I requested we move the discussion to a public list in case others have some thoughts on what we're discussing. I'll do my best to summarize where we're at and what we're thinking of doing. Duncan and Herbert: please jump in and correct any mistakes :)

* Upstream Hackage is hosted on a private IP address which is pointed at by a CDN. Currently, the connection between the CDN and Hackage is over plaintext HTTP, though that might switch to HTTPS soon.
* Content on Hackage is secured via hackage-security, so users of cabal-install 1.24 are downloading data securely from the CDN.
* There are three different Hackage mirroring tools available:
    * The original hackage-mirror tool in the Hackage repo itself, used during the migration from Hackage 1 to Hackage 2. It is not hackage-security aware
    * The hackage-mirror package on Hackage, which uploads to AWS S3. This is what hackage.fpcomplete.com uses, and feeds into Stack. It is also not hackage-security aware.
    * Herbert's hackage-mirror-tool, which uploads to Dreamhost S3, and ostensibly supports AWS S3 as well (though not yet tested). It _is_ hackage-security aware.
* We don't really want three different tools, and we _do_ want the mirror tools to be hackage-security aware, especially given the lack of SSL between the CDN and Hackage itself.
* The CDN has the potential to serve out-of-sync files, in particular deliver a 00-index.tar.gz file that refers to a certain package/version, but will return a 404 for the package tarball. This can cause confusion for downstream tools. The Dreamhost and AWS mirrors do not suffer from this.
* There are three additional Git repositories providing Hackage metadata in different formats: all-cabal-files, all-cabal-hashes, and all-cabal-metadata. These are used by Stack, Stackage, Nix, and stackage.org. They feed off of the AWS S3 mirror.

I believe the course of action we have planned is:

1. Ensure that Herbert's hackage-mirror-tool works correctly with AWS S3.
2. Switch over the hackage.fpcomplete.com AWS S3 mirror to use Herbert's hackage-mirror-tool, increasing its security, and allowing it to be listed as an official Hackage mirror.
    * This mirror will still be operated by FP Complete, making it less likely that a technical screw-up by one party will take down multiple mirrors by distributing the administration of systems.
3. Continue pointing all-cabal-files, -hashes, and -metadata at the AWS S3 mirror, which provides high reliability and will now be more secure based on the hackage-mirror-tool usage.

This will scratch a few different itches:

* A new mirror will be available on Hackage for cabal-install users
* The AWS S3 mirror will be more secure
* I can simplify some of my mirroring logic by not having to do workarounds for out-of-sync CDNs

I think this is all doable, and pretty much within reach. I just wanted to get this out in the public first in case others have input, perhaps based on other use cases of these mirrors or repos that I'm not aware of.

Michael

Reply all
Reply to author
Forward
0 new messages