Improvements to package hosting and security

Michael Snoyman

unread,

Apr 13, 2015, 6:02:46 AM4/13/15

to Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:

* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs)

* A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files

In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.

However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:

* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial

* A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4]

* Any kind of system level vulnerability could allow an attacker to compromise the server in the same way

Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.

I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.

[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure

[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-detailed-propsal

[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror

[4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now

[5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

Michael Snoyman

unread,

Apr 13, 2015, 6:28:35 AM4/13/15

to Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Also, since it's relevant, here's a Github repo with all of the cabal files from Hackage which (thanks to a cron job and Travis CI) automatically updates every 30 minutes:

https://github.com/commercialhaskell/all-cabal-files

Francesco Ariis

unread,

Apr 13, 2015, 8:21:48 AM4/13/15

to haskel...@haskell.org, haskell-inf...@community.galois.com, commerci...@googlegroups.com

On Mon, Apr 13, 2015 at 10:02:45AM +0000, Michael Snoyman wrote:
> I wrote up a strawman proposal last week[5] which clearly needs work to be
> a realistic option. My question is: are people interested in moving forward
> on this? If there's no interest, and everyone is satisfied with continuing
> with the current Hackage-central-authority, then we can proceed with having
> reliable and secure services built around Hackage. But if others- like me-
> would like to see a more secure system built from the ground up, please say
> so and let's continue that conversation.

I finished reading the proposal, the only minor remark I have is on this
sentence:

" Each signature may be revoked using standard GPG revokation.

It is the /key/ being revoked really, not the single signature (in our case
it would mean revoking every-package-version-or-revision-signed-by-that-key).
This in turn highlights the need for a well defined process on how to
handle "key transitions" (task left to the single implementators).

A distributed and secure hackage sounds like a dream, I really hope this
comes to life!

Michael Snoyman

unread,

Apr 13, 2015, 10:52:57 AM4/13/15

to haskel...@haskell.org, haskell-inf...@community.galois.com, commerci...@googlegroups.com

I think I was just wrong at that part of the proposal; it wouldn't be "standard GPG revokation" since, as you point out, that's for revoking a key. We'd need a custom revokation mechanism to make this work.

But as to your more general point: there was an added layer of indirection that I considered but didn't write up, but I happen to like. The idea would be that all of the authorization lists would work based off of an identifier (e.g., an email address). We would then have a separate mapping between email addresses and GPG public keys, which would follow the same signature scheme that all of the other files in the repo follow.

The downside to this is that it redoes the basic GPG keysigning mechanism to some extent, but it does address key transitions more easily.

Another possibility would be to encode the release date of a package/version and package/version/revision and use that date for checking validity of keys. That way, old signatures remain valid for perpetuity.

I'll admit to my relative lack of experience with GPG, so there's probably some built-in mechanism for addressing this kind of situation which would be better to follow.

A distributed and secure hackage sounds like a dream, I really hope this
comes to life!

--
You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/20150413121848.GA3834%40x60s.casa.
For more options, visit https://groups.google.com/d/optout.

Dennis J. McWherter, Jr.

unread,

Apr 13, 2015, 10:55:31 AM4/13/15

to commerci...@googlegroups.com, haskell-inf...@community.galois.com, haskel...@haskell.org

This proposal looks great. The one thing I am failing to understand (and I recognize the proposal is in early stages) is how to ensure redundancy in the system. As far as I can tell, much of this proposal discusses the centralized authority of the system (i.e. ensuring secure distribution) and only references (with little detail) the distributed store. For instance, say I host a package on a personal server and one day I decide to shut that server down; is this package now lost forever? I do see this line: "backup download links to S3" but this implies that the someone is willing to pay for S3 storage for all of the packages.

Are there plans to adopt a P2P-like model or something similar to support any sort of replication? Public resources like this seem to come and go, so it would be nice to avoid some of the problems associated with high churn in the network. That said, there is an obvious cost to replication. Likewise, the central authority would have to be updated with new, relevant locations to find the file (as it is currently proposed).

In any case, as I said before, the proposal looks great! I am looking forward to this.

Michael Snoyman

unread,

Apr 13, 2015, 11:00:04 AM4/13/15

to Dennis J. McWherter, Jr., commerci...@googlegroups.com, haskel...@haskell.org, haskell-inf...@community.galois.com

I purposely didn't get into those details in this document, as it can be layered on top of the setup I described here. The way I'd say this should be answered is twofold:

* FP Complete already hosts all packages on S3, and we intend to continue hosting all packages there in the future

* People in the community are welcome (and encouraged) to make redundant copies of packages, and then add hash-to-URL mappings to the main repo giving those redundant copies as additional download locations.

In that sense, the FP Complete S3 copy would simply be one of potentially many redundant copies that could exist.

Arnaud Bailly | Capital Match

unread,

Apr 13, 2015, 11:04:16 AM4/13/15

to Dennis J. McWherter, Jr., commerci...@googlegroups.com, haskel...@haskell.org, haskell-inf...@community.galois.com, mic...@snoyman.com

Just thinking aloud but wouldn't it be possible to take advantage of cryptographic ledgers a la Bitcoin for authenticating packages and tracking the history of change ? This would provide redundancy as the transactions log is distributed and "naturally" create a web of trust or at least authenticate transactions. People uploading or modifying a package would have to sign a transactions with someone having enough karma to allow this.

Then packages themselves could be completely and rather safely distributed through standard p2p file sharing.

I am not a specialist of crypto money, though.

My 50 cts

Arnaud

--

You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/4487776e-b862-429c-adae-477813e560f3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Arnaud Bailly

CTO | Capital Match

CapitalMatch

71 Ayer Rajah Crescent | #06-16 | Singapore 139951

(FR) +33 617 121 978 / (SG) +65 8408 7973 | arn...@capital-match.com | www.capital-match.com

Disclaimer:

Capital Match Platform Pte. Ltd. (the "Company") registered in Singapore (Co. Reg. No. 201501788H), a subsidiary of Capital Match Holdings Pte. Ltd. (Co. Reg. No. 201418682W), provides services that involve arranging for multiple parties to enter into loan and invoice discounting agreements. The Company does not provide any form of investment advice or recommendations regarding any listings on its platform. In providing its services, the Company's role is limited to an administrative function and the Company does not and will not assume any advisory, fiduciary or other duties to clients of its services.

Michael Snoyman

unread,

Apr 14, 2015, 1:01:17 AM4/14/15

to Arnaud Bailly | Capital Match, Dennis J. McWherter, Jr., commerci...@googlegroups.com, haskel...@haskell.org, haskell-inf...@community.galois.com

That could work in theory. My concern with such an approach is that- AFAIK- the tooling around that kind of stuff is not very well developed, as opposed to an approach using Git, SHA512, and GPG, which should be easy to combine. But I could be completely mistaken on this point; if existing, well vetted technology exists for this, I'm not opposed to using it.

Carter Schonwald

unread,

Apr 14, 2015, 10:56:31 PM4/14/15

to commerci...@googlegroups.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

any use of cryptographic primitives of any form NEEDS to articulate what the trust model is, and what the threat model is

likewise, i'm trying to understand who the proposed feature set is meant to serve.

Several groups are in the late stages of building prototypes at varying points in the design space for improving package hosting right now for haskell, and I'm personally inclined to let those various parties release the tools, and then experiment with them all, before trying to push heavily for any particular design that hasn't had larger community experimentation.

I actually care most about being able to have the full package set be git cloneable, both for low pain on premise hackage hosting for corporate intranets, and also for when i'm on a plane or boat and have no wifi. At my current job, ANY "host packages via s3" approach is totally untenable, and i'm sure among haskell using teams/organizations, this isn't a unique problem!

The Author authentication/signing model question in an important one, but I"m uncomfortable with just saying "SHA512 and GPG address that". Theres A LOT of subtlety to designing a signing protocol thats properly audit-able and secure! Indeed, GPG isn't even a darn asymmetric crypto algorithm, its a program that happens to IMPLEMENT many of these algorithms. If we are serious about having robust auditing/signing, handwaving about the cryptographic parts while saying its important is ... kinda irresponsible. And frustrating because it makes it hard to evaluate the hardest parts of the whole engineering problem! The rest of the design is crucially dependent on details of these choices, and yet its that part which isn't specified.

to repeat myself: there is a pretty rich design space for how we can evolve future hackage, and i worry that speccing things out and design by committee is going to be less effective than encouraging various parties to build prototypes for their own visions of future hackage, and THEN come together to combine the best parts of everyones ideas/designs. Theres so much diversity in how different people use hackage, i worry that any other way will run into failing to serve the full range of haskell users!

cheers

To unsubscribe from this group and stop receiving emails from it, send an email to commercialhaskell+unsubscribe@googlegroups.com.
To post to this group, send email to commercialhaskell@googlegroups.com.

Gershom B

unread,

Apr 15, 2015, 12:07:44 AM4/15/15

to commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

So I want to focus just on the idea of a “trust model” to hackage packages.

I don’t think we even have a clear spec of the problem we’re trying to solve here in terms of security. In particular, the basic thing hackage is a central authority for is “packages listed on hackage” — it provides a namespace, and on top of that provides the ability to explore the code under each segment of the namespace, including docs and code listings. Along with that it provides the ability to search through that namespace for things like package descriptions and names.

Now, how does security fit into this? Well, at the moment we can prevent packages from being uploaded by people who are not authorized. And whoever is authorized is the first person who uploaded the package, or people they delegate to, or people otherwise added by hackage admins via e.g. the orphaned package takeover process. A problem is this is less a guarantee than we would like since e.g. accounts may be compromised, we could be MITMed (or the upload could be) etc.

Hence comes the motivation for some form of signing. Now, I think the proposal suggested is the wrong one — it says “this is a trustworthy package” for some notion of a web of trust of something. Webs of trust are hard and work poorly except in the small. It would be better, I think, to have something _orthogonal_ to hackage or any other package distribution system that attempts a _much simpler_ guarantee — that e.g. the person who signed a package as being “theirs” is either the same person that signed the prior version of the package, or was delegated by them (or hackage admins). Now, on top of that, we could also have a system that allowed for individual users, if they had some notion of “a person’s signature” such that they believed it corresponded to a person, to verify that _actual_ signature was used. But there is no web of trust, no idea given of who a user does or doesn’t believe is who they say they are or anything like that. We don’t attempt to guarantee anything more than a “chain of custody,” which is all we now have (weaker) mechanisms to enforce.

In my mind, the key elements of such a system are that it is orthogonal to how code is distributed and that it is opt-in/out.

One model to look at might be Apple’s — distribute signing keys widely, but allow centralized revocation of a malicious actor is found. Another notion, somewhat similar, is ssl certificates. Anybody, including a malicious actor, can get such a certificate. But at least we have the guarantee that once we start talking to some party, malicious or otherwise, no other party will “swap in” for them midstream.

In general, what I’m urging is to limit the scope of what we aim for. We need to give users the tools to enforce the level of trust that they want to enforce, and to verify certain specific claims. But if we shoot for more, we will either have difficult to use system, or will fail in some fashion. And furthermore I think we should have this discussion _independent_ of hackage, which serves a whole number of functions, and until recently hasn’t even _purported_ to even weakly enforce any guarantees about who uploaded the code it hosts.

Cheers,
Gershom

> >>> an email to commercialhask...@googlegroups.com.
> >>> To post to this group, send email to commerci...@googlegroups.com.

> >>>
> >> To view this discussion on the web visit
> >>> https://groups.google.com/d/msgid/commercialhaskell/4487776e-b862-429c-adae-477813e560f3%40googlegroups.com
> >>>
> >>> .
> >>
> >>
> >>> For more options, visit https://groups.google.com/d/optout.
> >>>
> >>
> >>
> >> --

> >> *Arnaud Bailly*

> >> *Capital Match Platform Pte. Ltd. (the "Company") registered in Singapore

> >> (Co. Reg. No. 201501788H), a subsidiary of Capital Match Holdings Pte. Ltd.
> >> (Co. Reg. No. 201418682W), provides services that involve arranging for
> >> multiple parties to enter into loan and invoice discounting agreements. The
> >> Company does not provide any form of investment advice or recommendations
> >> regarding any listings on its platform. In providing its services, the
> >> Company's role is limited to an administrative function and the Company
> >> does not and will not assume any advisory, fiduciary or other duties to

> >> clients of its services.*
> >>
> >> _______________________________________________
> haskell-infrastructure mailing list
> haskell-inf...@community.galois.com
> http://community.galois.com/mailman/listinfo/haskell-infrastructure
>

Greg Weber

unread,

Apr 15, 2015, 12:12:22 AM4/15/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

What security guarantees do we get from this proposal that are not present from Chris's package signing work?

Part of the goal of the package signing is that we no longer need to trust Hackage. If it is compromised and packages are compromised, then anyone using signing tools should automatically reject the compromised packages.

Right now I think the answer is: that this provides a security model for revisions: it limits what can be done and formalizes the trust of this process in a cryptographic way. Whereas with Chris's work there is no concept of a (trusted) revision and a new package must be released?

--

You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com.

Michael Snoyman

unread,

Apr 15, 2015, 12:34:58 AM4/15/15

to Carter Schonwald, commerci...@googlegroups.com, arn...@capital-match.com, den...@deathbytape.com, haskel...@haskell.org, haskell-inf...@community.galois.com

On Wed, Apr 15, 2015 at 5:56 AM Carter Schonwald <carter.s...@gmail.com> wrote:

any use of cryptographic primitives of any form NEEDS to articulate what the trust model is, and what the threat model is

likewise, i'm trying to understand who the proposed feature set is meant to serve.

Several groups are in the late stages of building prototypes at varying points in the design space for improving package hosting right now for haskell, and I'm personally inclined to let those various parties release the tools, and then experiment with them all, before trying to push heavily for any particular design that hasn't had larger community experimentation.

I'd be fine with that, if there was public discussion of what those projects are trying to solve. Of the ones that I have asked questions about, I haven't heard any of them trying to address the trust/security issues I've raised here, which is why I'm asking the mailing list if there's interest.

I'm not OK with simply stalling any community process for improving our situation because "someone's working on something related, and it'll be done Real Soon Now(tm)." That's a recipe for stagnation.

I actually care most about being able to have the full package set be git cloneable, both for low pain on premise hackage hosting for corporate intranets, and also for when i'm on a plane or boat and have no wifi. At my current job, ANY "host packages via s3" approach is totally untenable, and i'm sure among haskell using teams/organizations, this isn't a unique problem!

I agree completely. And similarly, hosting all packages in a Git repository is *also* unusable in other situations, such as normal users wanting to get a minimal set of downloads to get started on a project. That's why I left the download information in this proposal at URL; you can add different URLs to support Git repository contents as well.

It would also be pretty easy to modify the all-cabal-files repo I pointed to and create a repository containing the tarballs themselves. I don't know if Github would like hosting that much content, but I have no problem helping roll that out.

The Author authentication/signing model question in an important one, but I"m uncomfortable with just saying "SHA512 and GPG address that". Theres A LOT of subtlety to designing a signing protocol thats properly audit-able and secure! Indeed, GPG isn't even a darn asymmetric crypto algorithm, its a program that happens to IMPLEMENT many of these algorithms. If we are serious about having robust auditing/signing, handwaving about the cryptographic parts while saying its important is ... kinda irresponsible. And frustrating because it makes it hard to evaluate the hardest parts of the whole engineering problem! The rest of the design is crucially dependent on details of these choices, and yet its that part which isn't specified.

I think you're assuming that my "proposal" was more than a point of discussion. It's not. When starting this thread, I tried to make it clear that this is to gauge interest in creating a real solution. If there's interest, we should figure out these points. If there's no interest, then I'm glad I didn't invest weeks in coming up with a more robust proposal.

to repeat myself: there is a pretty rich design space for how we can evolve future hackage, and i worry that speccing things out and design by committee is going to be less effective than encouraging various parties to build prototypes for their own visions of future hackage, and THEN come together to combine the best parts of everyones ideas/designs. Theres so much diversity in how different people use hackage, i worry that any other way will run into failing to serve the full range of haskell users!

I disagree here pretty strongly. Something with a strong social element requires discussion upfront, not someone creating a complete solution and then trying to impose it on everyone else. There are certainly things that *can* be done without discussion. Hosting cabal and tar.gz files in a Git repo, or mirroring to S3, are orthogonal actions that require no coordination, for instance. But tweaking the way we view the trust model of Hackage is pretty central, and needs discussion.

Michael

To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/4487776e-b862-429c-adae-477813e560f3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Arnaud Bailly
CTO | Capital Match
CapitalMatch
71 Ayer Rajah Crescent | #06-16 | Singapore 139951
(FR) +33 617 121 978 / (SG) +65 8408 7973 | arn...@capital-match.com | www.capital-match.com
Disclaimer:
Capital Match Platform Pte. Ltd. (the "Company") registered in Singapore (Co. Reg. No. 201501788H), a subsidiary of Capital Match Holdings Pte. Ltd. (Co. Reg. No. 201418682W), provides services that involve arranging for multiple parties to enter into loan and invoice discounting agreements. The Company does not provide any form of investment advice or recommendations regarding any listings on its platform. In providing its services, the Company's role is limited to an administrative function and the Company does not and will not assume any advisory, fiduciary or other duties to clients of its services.

--

You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.

To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/33c89d4a-12b9-495b-a151-7e317177b061%40googlegroups.com.

Michael Snoyman

unread,

Apr 15, 2015, 12:47:58 AM4/15/15

to Gershom B, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

I'd like to ignore features of Hackage like "browsing code" for purposes of this discussion. That's clearly something that can be a feature layered on top of a real package store by a web interface. I'm focused on just that lower level of actually creating a coherent set of packages.

In that realm, I think you've understated what trust we're putting in Hackage today. We have to trust it to:

* Properly authenticate users

* Keep authorization lists of who can make uploads/revisions (and who can grant those rights)

* Allow safe uploads of packages and metadata

* Distribute packages and metadata to users safely

I think we agree, but I'll say it outright: Hackage currently *cannot* succeed at the last two points, since all interactions with it from cabal-install are occurring over non-secure HTTP connections, making it vulnerable to MITM attacks on both upload and download. The package signing work- if completely adopted by the community- would address that.

What I'm raising here is the first two points. And even those points have an impact on the other two points. To draw this out a bit more clearly:

* Currently, authorized uploaders are identified by a user name and a password on Hackage. How do we correlate that to a GPG key? Ideally, the central upload authority would be collecting GPG public keys for all uploaders so that signature verification can happen correctly.

* There's no way for an outside authority to vet the 00-index.tar.gz file downloaded from Hackage; it's a completely opaque, black box. Having the set of authorization rules be publicly viewable, auditable, and verifiable overcomes that.

I'd really like to make sure that we're separating two questions here: (1) Is there a problem with the way we're trusting Hackage today? (2) Is the strawman proposal I sent anywhere close to a real solution? I feel strongly about (1), and very weakly about (2).

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/etPan.552de40d.3d1b58ba.f2%40mbp.local.

Michael Snoyman

unread,

Apr 15, 2015, 12:50:52 AM4/15/15

to Greg Weber, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKRanNCnSV%3Ddds4ZDmacNO8WMxSgDmEh6acc0StMh%2Btgz%3D09hA%40mail.gmail.com.

Greg Weber

unread,

Apr 15, 2015, 1:02:29 AM4/15/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

On Tue, Apr 14, 2015 at 9:50 PM, Michael Snoyman <mic...@snoyman.com> wrote:

Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).

Is it possible to separate out the concept of trusted revisions from a distributed hackage (into 2 separate proposals) then?

If Hackage wanted to it could implement trusted revisions. Or some other (distributed or non-distributed) package service could implement it (as long as the installer tool knows to check for revisions there, perhaps this would be added to Chris's signing tooling).

Michael Snoyman

unread,

Apr 15, 2015, 1:08:44 AM4/15/15

to Greg Weber, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

On Wed, Apr 15, 2015 at 8:02 AM Greg Weber <gr...@gregweber.info> wrote:

On Tue, Apr 14, 2015 at 9:50 PM, Michael Snoyman <mic...@snoyman.com> wrote:
Yes, I think you've summarized the security aspects of this nicely. There's also the reliability and availability guarantees we get from a distributed system, but that's outside the realm of security (unless you're talking about denial of service).

Is it possible to separate out the concept of trusted revisions from a distributed hackage (into 2 separate proposals) then?
If Hackage wanted to it could implement trusted revisions. Or some other (distributed or non-distributed) package service could implement it (as long as the installer tool knows to check for revisions there, perhaps this would be added to Chris's signing tooling).

It would be a fundamental shift away from how Hackage does things today. I think the necessary steps would be:

1. Hackage ships all revisions to cabal files somehow (personally, I think it should be doing this anyway).

2. We have a list of trustees who are allowed to edit metadata. The signing work already has to recapture that information for allowed uploaders since Hackage doesn't collect GPG keys

3. Every time a revision is made, the person making the revision would need to sign the new revision

I'm open to other ideas, this is just what came to mind first.

Michael

Gershom B

unread,

Apr 15, 2015, 1:19:17 AM4/15/15

to Michael Snoyman, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

Ok, to narrow it down, you are concerned about the ability to

> * Properly authenticate users
> * Keep authorization lists of who can make uploads/revisions (and who can grant those rights)

and more specifically:

> * Currently, authorized uploaders are identified by a user name and a
> password on Hackage. How do we correlate that to a GPG key? Ideally, the
> central upload authority would be collecting GPG public keys for all
> uploaders so that signature verification can happen correctly.
> * There's no way for an outside authority to vet the 00-index.tar.gz file
> downloaded from Hackage; it's a completely opaque, black box. Having the
> set of authorization rules be publicly viewable, auditable, and verifiable
> overcomes that.

On 1) now you have the problem “what if the central upload authority’s store of GPG keys is violated”. You’ve just kicked the can. “Web of Trust” is not a tractable answer. My answer is simpler: I can verify that the signer of version 1 of a package is the same as the signer of version 0.1. This is no small trick. And I can do so orthogonal to hackage. Now, if I really want to verify that the signer of version 1 is the person who is “Michael Snoyman” and is in fact the exact Michael Snoyman I intend, then I need to get your key by some entirely other mechanism. And that is my problem, and, definitionally, no centralized store can help me in that regard unless I trust it absolutely — which is precisely what I don’t want to do.

On 2) I would like to understand more of what your concern with regards to “auditing” is. What specific information would you like to know that you do not? Improved audit logs seem again orthogonal to any of these other security concerns, unless you are simply worried about a “metadata only” attack vector. In any case, we can incorporate the same signing practices for metadata as for packages — orthogonal to hackage or any other particular storage mechanism. It is simply an unrelated question. And, honestly, compared to all the other issues we face I feel it is relatively minor (the signing component, not a better audit trail).

In any case, your account of the first two points reveals some of the confusion I think that remains:

> * Allow safe uploads of packages and metadata
> * Distribute packages and metadata to users safely

What is the definition of “safe” here? My understanding is that in the field of security one doesn’t talk about “safe” in general, but with regards to a particular profile of a sort of attacker, and always only as a difference of degree, not kind.

So who do we want to prevent from doing what? How “safe” is “safe”? Safe from what? From a malicious script-kid, from a malicious collective “in it for the lulz,” from a targeted attack against a particular end-client, from just poorly/incompetently written code? What are we “trusting”? What concrete guarantees would we like to make about user interactions with packages and package repositories?

While I’m interrogating language, let me pick out one other thing I don’t understand: "creating a coherent set of packages” — what do you mean by “coherent”? Is this something we can specify? Hackage isn’t supposed to be coherent — it is supposed to be everything. Within that “everything” we are now attempting to manage metadata to provide accurate dependency information, at a local level. But we have no claims about any global coherence conditions on the resultant graphs. Certainly we intend to be coherent in the sense that the combination of a name/version/revision should indicate one and only one thing (and that all revisions of a version should differ at most in dependency constraints in their cabal file) — but this is a fairly minimal criteria. And in fact, it is one that is nearly orthogonal to security concerns altogether.

What I’m driving at is — it sounds like we _mainly_ want new decentralized security mechanisms, at the cabal level, but we also want, potentially, a few centralized mechanisms. However, centralization is weakness from a security standpoint. So, ideally, we want as few centralized mechanisms as possible, and we want the consequences of those mechanisms being broken to be “recoverable” at the point of local verification.

Let me spell out a threat model where that makes sense. An adversary takes control of the entire hackage server through some zero day linux exploit we have no control over — or perhaps they are an employee at the datacenter where we host hackage and secure control via more direct means, etc. They have total and complete control over the box. They can accept anything they want, and they can serve anything they want. And they are sophisticated enough to be undetected for say a week.

Now, we want it to be the case that _whatever_ this adversary does, they cannot “trick” someone who types “cabal install warp” into instead cabal installing something malicious. How do we do so? _Now_ we have a security problem that is concrete enough to discuss. And furthermore, I would claim that if we don’t have at least some story for this threat model, then we haven’t established anything much “safer” at all.

This points towards a large design space, and a lot of potential ideas, all of which feel entirely different than the “strawman” proposal, since the emphasis there is towards the changes to a centralized mechanism (even if in turn, the product of that mechanism itself is then distributed and git cloneable or whatever).

Cheers,
Gershom

Greg Weber

unread,

Apr 15, 2015, 1:21:10 AM4/15/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Perhaps this is not really doable, but I was thinking there should be a proposal for a specification for trusted revisions. These are integration details for Hackage just as the current proposal has some implementation about a distributed package service.

I actually think the easiest way to make revisions secure with Hackage is to precisely limit what can be revised. If one can only change an upper bound of an existing dependency that greatly limits the attack vectors.

Michael Snoyman

unread,

Apr 15, 2015, 1:43:42 AM4/15/15

to Gershom B, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On Wed, Apr 15, 2015 at 8:19 AM Gershom B <gers...@gmail.com> wrote:

Ok, to narrow it down, you are concerned about the ability to

> * Properly authenticate users
> * Keep authorization lists of who can make uploads/revisions (and who can grant those rights)

and more specifically:

> * Currently, authorized uploaders are identified by a user name and a
> password on Hackage. How do we correlate that to a GPG key? Ideally, the
> central upload authority would be collecting GPG public keys for all
> uploaders so that signature verification can happen correctly.
> * There's no way for an outside authority to vet the 00-index.tar.gz file
> downloaded from Hackage; it's a completely opaque, black box. Having the
> set of authorization rules be publicly viewable, auditable, and verifiable
> overcomes that.

On 1) now you have the problem “what if the central upload authority’s store of GPG keys is violated”. You’ve just kicked the can. “Web of Trust” is not a tractable answer. My answer is simpler: I can verify that the signer of version 1 of a package is the same as the signer of version 0.1. This is no small trick. And I can do so orthogonal to hackage. Now, if I really want to verify that the signer of version 1 is the person who is “Michael Snoyman” and is in fact the exact Michael Snoyman I intend, then I need to get your key by some entirely other mechanism. And that is my problem, and, definitionally, no centralized store can help me in that regard unless I trust it absolutely — which is precisely what I don’t want to do.

You've ruled out all known solutions to the problem, therefore no solution exists ;)

To elaborate slightly: the issue of obtaining people's keys is a problem that exists in general, and has two main resolutions: a central authority, and a web of trust. You've somehow written off completely the web of trust (I'm not sure *why* you think that's a good idea, you haven't explained it), and then stated that- since the only remaining option is a central authority- it's no better than Hackage. I disagree:

1. Maintaining security of a single GPG key is much simpler than maintaining the security of an entire web application, as is currently needed by Hackage.

2. There's no reason we need an either/or setup: we can have a central authority sign keys. If user's wish to trust that authority, they may do so, and thereby get access to other keys. If that central authority is compromised, we revoke that authority and move on to another one. Importantly: we haven't put all our eggs in one basket, as is done today.

On 2) I would like to understand more of what your concern with regards to “auditing” is. What specific information would you like to know that you do not? Improved audit logs seem again orthogonal to any of these other security concerns, unless you are simply worried about a “metadata only” attack vector. In any case, we can incorporate the same signing practices for metadata as for packages — orthogonal to hackage or any other particular storage mechanism. It is simply an unrelated question. And, honestly, compared to all the other issues we face I feel it is relatively minor (the signing component, not a better audit trail).

There's a lot of stuff going on inside of Hackage which we have no insight into or control over. The simplest is that we can't review a log of revisions. Improving that is a good thing, and I hope Hackage does so. Nonetheless, I'd still prefer a fully open, auditable system, which isn't possible with "just tack it on to Hackage."

In any case, your account of the first two points reveals some of the confusion I think that remains:

> * Allow safe uploads of packages and metadata
> * Distribute packages and metadata to users safely

What is the definition of “safe” here? My understanding is that in the field of security one doesn’t talk about “safe” in general, but with regards to a particular profile of a sort of attacker, and always only as a difference of degree, not kind.

I didn't think this needed diving into, because the problems seem so fundamental they weren't worth explaining. Examples of safety issues are:

* An attacker sitting between an uploader and Hackage can replace the package contents with something nefarious, corrupting the package for all downloaders

* An attacker sitting between a downloader and Hackage can replace the package contents with something nefarious, corrupting the package for that downloader

* This doesn't even have to be a conscious attack; I saw someone on Reddit report that they tried to download a package at an airport WiFi, and instead ended up downloading the HTML "please log in" page

* Eavesdropping attacks on uploaders: it's possible to capture packets indicating upload headers to Hackage, such as when using open WiFi (think the airport example again). Those headers include authorization headers. Thanks to Hackage now using digest authentication, this doesn't lead to an immediate attack, but digest authentication is based on MD5, which is not the most robust hash function

* Normal issues with password based authentication: insecure passwords, keyloggers, etc.

* Vulnerabilities in the Hackage codebase or its hosting that expose passwords and/or allow arbitrary uploads

So who do we want to prevent from doing what? How “safe” is “safe”? Safe from what? From a malicious script-kid, from a malicious collective “in it for the lulz,” from a targeted attack against a particular end-client, from just poorly/incompetently written code? What are we “trusting”? What concrete guarantees would we like to make about user interactions with packages and package repositories?

While I’m interrogating language, let me pick out one other thing I don’t understand: "creating a coherent set of packages” — what do you mean by “coherent”? Is this something we can specify? Hackage isn’t supposed to be coherent — it is supposed to be everything. Within that “everything” we are now attempting to manage metadata to provide accurate dependency information, at a local level. But we have no claims about any global coherence conditions on the resultant graphs. Certainly we intend to be coherent in the sense that the combination of a name/version/revision should indicate one and only one thing (and that all revisions of a version should differ at most in dependency constraints in their cabal file) — but this is a fairly minimal criteria. And in fact, it is one that is nearly orthogonal to security concerns altogether.

All I meant is a set of packages uploaded by an approved set of uploaders, as opposed to allowing in arbitrary modifications used by others.

What I’m driving at is — it sounds like we _mainly_ want new decentralized security mechanisms, at the cabal level, but we also want, potentially, a few centralized mechanisms. However, centralization is weakness from a security standpoint. So, ideally, we want as few centralized mechanisms as possible, and we want the consequences of those mechanisms being broken to be “recoverable” at the point of local verification.

Yes, that's exactly the kind of goal I'm aiming towards.

Let me spell out a threat model where that makes sense. An adversary takes control of the entire hackage server through some zero day linux exploit we have no control over — or perhaps they are an employee at the datacenter where we host hackage and secure control via more direct means, etc. They have total and complete control over the box. They can accept anything they want, and they can serve anything they want. And they are sophisticated enough to be undetected for say a week.

Now, we want it to be the case that _whatever_ this adversary does, they cannot “trick” someone who types “cabal install warp” into instead cabal installing something malicious. How do we do so? _Now_ we have a security problem that is concrete enough to discuss. And furthermore, I would claim that if we don’t have at least some story for this threat model, then we haven’t established anything much “safer” at all.

This points towards a large design space, and a lot of potential ideas, all of which feel entirely different than the “strawman” proposal, since the emphasis there is towards the changes to a centralized mechanism (even if in turn, the product of that mechanism itself is then distributed and git cloneable or whatever).

If we have agreement that the problem exists, I'm quite happy to flesh out other kinds of attack vectors and then discuss solutions. Again, my proposal is purely meant to be a starting point for discussion, not an answer to the problems.

Michael

Gershom B

unread,

Apr 15, 2015, 1:50:44 AM4/15/15

to Michael Snoyman, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On April 15, 2015 at 1:43:42 AM, Michael Snoyman (mic...@snoyman.com) wrote:
> > There's a lot of stuff going on inside of Hackage which we have
> no insight into or control over. The simplest is that we can't
> review a log of revisions. Improving that is a good thing, and
> I hope Hackage does so. Nonetheless, I'd still prefer a fully
> open, auditable system, which isn't possible with "just tack

> it on to Hackage.”

Ok, I’m going to ignore everything else and just focus on this, because it seems to be the only thing related to hackage, and therefore should be thought of separately from everything else.

What _else_ goes on that “we have no insight or control over”? Can we document the full list. Can we specify what we mean by insight? I take that to mean auditability. Can we specify what we mean by “control? (There I have no idea).

(With regards to revision logs, revisions are still a relatively new feature and there’s lots of bits and bobs missing, and I agree this is low hanging fruit to improve).

—Gershom

Michael Snoyman

unread,

Apr 15, 2015, 1:57:07 AM4/15/15

to Gershom B, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

I'm not intimately familiar with the Hackage API, so I can't give a point-by-point description of what information is and is not auditable. However, *all* of that is predicated on trusting Hackage to properly authenticate users and be immune to attacks. For example, even if I can ask Hackage who uploaded a certain package/version, there's no way I can audit that that's actually the case, besides going and asking that person. And I can't even do *that* reliably, since the only identification for an uploader is the Hackage username, and I can't verify that someone actually owns that username without asking for his/her password also.

One feature Hackage could add that would make the latter a bit better would be to verify identity claims from people (ala OpenID), though that still leaves us in the position of needing to fully trust Hackage.

Michael

Gershom B

unread,

Apr 15, 2015, 2:14:53 AM4/15/15

to Michael Snoyman, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On April 15, 2015 at 1:57:07 AM, Michael Snoyman (mic...@snoyman.com) wrote:
> I'm not intimately familiar with the Hackage API, so I can't give a
> point-by-point description of what information is and is not auditable.

Okay, then why did you write "There's a lot of stuff going on inside of Hackage which we have no insight into or control over.”?

I would very much like to have a clarifying discussion, as you are gesturing towards some issue we should think about. But it is difficult when you make broad claims, and are not able to explain what they mean.

Cheers,
Gershom

Michael Snoyman

unread,

Apr 15, 2015, 2:47:16 AM4/15/15

to Gershom B, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

I think you're reading too much into my claims, and specifically on the unimportant aspects of them. I can clarify these points, but I think drilling down deeper is a waste of time. To answer this specific question:

* There's no clarity on *why* change was approved. I see that person X uploaded a revision, but why was person X allowed to do so?

* I know of no way to see the history of authorization rules.

* Was JohnDoe always a maintainer of foobar, or was that added at some point?

* Who added this person as a maintainer?

* Who gave this other person trustee power? Who took it away?

All of these things would come for free with an open system where authorization rules are required to be encoded in a freely viewable file, and signature are used to verify the data.

And to be clear, to make sure no one thinks I'm saying otherwise: I don't think Hackage has done anything wrong by approaching things the way it has until now. I probably would have come up with a very similar system. I'm talking about new functionality and requirements that weren't stated for the original system. Don't take this as "Hackage is bad," but rather, "time to batten down the hatches."

Michael

Carter Schonwald

unread,

Apr 15, 2015, 8:20:23 AM4/15/15

to Michael Snoyman, Gershom B, den...@deathbytape.com, commerci...@googlegroups.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

Ok, let me counter that with a simpler idea: every Hackage edit action has an explanation field that the trustee can choose to optionally write some text in

And additonally: there Is a globally visible feed / log of all Hackage edits.

I believe some folks are working to add those features to hackage this spring.

I am emphatically against stronger security things being tacked on top without a threat model that precisely jusrifies why. Recent experience has shown me that organizations which mandate processes in the the name of a nebulous security model counter intuitively become less secure and less effective.

Let me repeat myself, enterprise sounding security processes should only be adopted in the context of a concrete threat model that actually specifically motivates the applicable security model. Anything else is kiss of death. Please be concrete. Additonally, specificity allows us to think of approaches that can be both secure and easy to use.

Michael Snoyman

unread,

Apr 15, 2015, 8:34:07 AM4/15/15

to Carter Schonwald, Gershom B, den...@deathbytape.com, commerci...@googlegroups.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

I've given plenty of concrete attack vectors in this thread. I'm not going to repeat all of them here. But addressing your "simpler idea": how do we know that the claimed person actually performed that action? If Hackage is hacked, there's no way to verify *any* such log. With a crypto-based system, we know specifically which key is tied to which action, and can invalidate those actions in the case of a key becoming compromised.

There are no nebulous claims going on here. Hackage is interacting with users in a way that is completely susceptible to MITM attacks. That's a fact, and an easily exploitable attack vector for someone in the right position in the network. I'm also precisely *not* recommending we tack security things on top: I'm proposing we design a secure system from the ground up.

Also, if we're going to talk about nebulous, let's start with the word "enterprise sounding." That's an empty criticism, and I should hope we're above that kind of thing.

Carter Schonwald

unread,

Apr 15, 2015, 9:09:24 AM4/15/15

to Michael Snoyman, Gershom B, commerci...@googlegroups.com, den...@deathbytape.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

Ok. Let's get https support into cabal.

How do we best go about doing that?

Michael Snoyman

unread,

Apr 15, 2015, 9:11:22 AM4/15/15

to Carter Schonwald, Gershom B, commerci...@googlegroups.com, den...@deathbytape.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

I'm 100% in favor of that. Last time it was brought up, we ended up in a debate about the the Haskell Platform and the PVP, which left the relevant package authors not wanting to get involved. If someone starts the conversation up, I will fully support it.

That will fix the largest problem we have. It still means we're placing all of our trust in Hackage, which sets up a single point of failure. We can, and should, do better than that.

--

You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAHYVw0xbNQPZ%2Bockbn1Zve69eQoZ4OOeUKt-bqa72vn-N_FQPg%40mail.gmail.com.

Carter Schonwald

unread,

Apr 15, 2015, 9:12:29 AM4/15/15

to Michael Snoyman, Gershom B, den...@deathbytape.com, commerci...@googlegroups.com, arn...@capital-match.com, haskell-inf...@community.galois.com, haskel...@haskell.org

A cryptographcially unforgable Hackage log is an interesting idea. I'll have to think about what that means though.

Gershom B

unread,

Apr 15, 2015, 9:13:28 AM4/15/15

to Michael Snoyman, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, commerci...@googlegroups.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On April 15, 2015 at 8:34:07 AM, Michael Snoyman (mic...@snoyman.com) wrote:
> I've given plenty of concrete attack vectors in this thread. I'm not going
> to repeat all of them here. But addressing your "simpler idea": how do we
> know that the claimed person actually performed that action? If Hackage is
> hacked, there's no way to verify *any* such log. With a crypto-based
> system, we know specifically which key is tied to which action, and can
> invalidate those actions in the case of a key becoming compromised.

So amend Carter’s proposal with the requirement that admin/trustee actions be signed as well. Now we can audit the verification trail. Done.

But let me pose a more basic question: Assume somebody falsified the log, but could _not_ falsify any package contents (because the latter were verified at the use site). And further, assume we had a signing trail for revisions as well. Now what is the worst that this bad actor could accomplish?

This is why it helps to have a “threat model”. I think there is a misunderstanding here on what Carter is asking for. A “threat model” is not a list of potential vulnerabilities. Rather, it is a statement of what types of things are important to mitigate against, and from whom. There is no such thing as a completely secure system, except, perhaps an unplugged one. So when you say you want something “safe” and then tell us ways the current system is “unsafe” then that’s not enough. We need to have a criterion by which we _could_ judge a future system at least “reasonably safe enough”.

My sense of a threat model prioritizes package signing (and I guess revision signing now too) but e.g. doesn’t consider a signed verifiable audit trail a big deal, because falsifying those logs doesn’t easily translate into an attack vector.

You are proposing large, drastic changes. Such changes are likely to get bogged down and fail, especially to the degree they involve designing systems in ways that are not in widespread use already. And even if such changes were feasible, and even if they were a sound approach, it would take a long time to put the pieces together to carry them out smoothly across the ecosystem.

Meanwhile, if we can say “in fact this problem decomposes into six nearly unrelated problems” and then prioritize those problems, it is likely that all can be addressed incrementally, which means less development work, greater chance of success, and easier rollout. I remain convinced that you raise some genuine issues, but they decompose into nearly unrelated problems that can and should be tackled individually.

Cheers,
Gershom

Dennis J. McWherter, Jr.

unread,

Apr 15, 2015, 9:24:47 AM4/15/15

to commerci...@googlegroups.com, haskell-inf...@community.galois.com, arn...@capital-match.com, haskel...@haskell.org

As far as the threat model is concerned, I believe the major concern is using "untrusted" code (for the definition of untrusted such that the source is not the author you expected). Supposing this group succeeds in facilitating greater commercial adoption of Haskell, the one of the easiest vectors (at this moment) to break someone's Haskell-based system is to simply swap a modified version of a library containing an exploit.

That said, we should also recognize this as a general problem. Some ideas on package manager attacks are at [1].

Further, I see what Gershom is saying about gaining adoption within the current community. However, I wonder (going off of his thought about decomposing the problem) if the system for trust could be generic enough to integrate into an existing solution to help mitigate this risk.

[1] http://www.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html

Michael Snoyman

unread,

Apr 15, 2015, 9:27:55 AM4/15/15

to Gershom B, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, commerci...@googlegroups.com, haskell-inf...@community.galois.com, arn...@capital-match.com

I think you've missed what I've said, so I'll try to say it more clearly: we have no insight right now into how Hackage makes decisions about who's allowed to upload and revise packages. We have no idea how to make a correspondence between a Hackage username and some externally-verifiable identity (like a GPG public key). In that world: how can we externally verify signatures of packages on Hackage?

I'm pretty familiar with Chris's package signing work. It's a huge step forward. But by necessity of the weaknesses in what Hackage is exposing, we have no way of fully verifying all signatures.

If you see the world differently, please explain. Both you and Carter seem to assume I'm talking about some other problem that's not yet been described. I'm just trying to solve the problem already identified. I think you've missed a few steps necessary to have a proper package signing system in place.

You may think that the proposal I've put together is large and a massive shift. It's honestly the minimal number of changes I can see towards having a method to fully verify all signatures of packages that Hackage is publishing. If you see a better way to do it, I'd rather do that, so tell me what it is.

Michael

* * *

I think the above was clear enough, but in case it's not, here's an example. Take the yesod-core package, for which MichaelSnoyman and GregWeber are listed as maintainers. Suppose that we have information from Hackage saying:

yesod-core-1.4.0 released by MichaelSnoyman

yesod-core-1.4.1 released by FelipeLessa

yesod-core-1.4.2 released by GregWeber

yesod-core-1.4.2 cabal file revision by HerbertValerioRiedel

How do I know:

* Which signatures on yesod-core-1.4.0 to trust? Should I trust MichaelSnoyman's and GregWeber's only? What if GregWeber wasn't a maintainer when 1.4.0 was released?

* How can 1.4.1 be trusted? It was released by a non-maintainer. In reality, we can guess that FelipeLessa used to be a maintainer but was then removed, but how do we know this?

* Similarly, we can guess that HerbertValerioRiedel is granted as a trustee the right to revise a cabal file.

* But in any event: how do we get the GPG keys for any of these users?

* And since Hackage isn't enforcing any GPG signatures, what should we do when the signatures for a package don't exist?

This is just one example of the impediments to adding package signing to the current Hackage system.

Gershom B

unread,

Apr 15, 2015, 9:46:08 AM4/15/15

to Michael Snoyman, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, commerci...@googlegroups.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On April 15, 2015 at 9:27:55 AM, Michael Snoyman (mic...@snoyman.com) wrote:
> I think the above was clear enough, but in case it's not, here's an
> example. Take the yesod-core package, for which MichaelSnoyman and
> GregWeber are listed as maintainers. Suppose that we have information from
> Hackage saying:
>
> yesod-core-1.4.0 released by MichaelSnoyman
> yesod-core-1.4.1 released by FelipeLessa
> yesod-core-1.4.2 released by GregWeber
> yesod-core-1.4.2 cabal file revision by HerbertValerioRiedel
>
> How do I know:
>
> * Which signatures on yesod-core-1.4.0 to trust? Should I trust
> MichaelSnoyman's and GregWeber's only? What if GregWeber wasn't a
> maintainer when 1.4.0 was released?
> * How can 1.4.1 be trusted? It was released by a non-maintainer. In
> reality, we can guess that FelipeLessa used to be a maintainer but was then
> removed, but how do we know this?
> * Similarly, we can guess that HerbertValerioRiedel is granted as a trustee
> the right to revise a cabal file.
> * But in any event: how do we get the GPG keys for any of these users?
> * And since Hackage isn't enforcing any GPG signatures, what should we do
> when the signatures for a package don't exist?
>
> This is just one example of the impediments to adding package signing to
> the current Hackage system.

None of this makes sense to me. You should trust whoever’s keys you choose to trust. That is your problem. I can’t tell you who to trust. How do you get the GPG key? Well that is also your problem. We can’t implement our own service for distributing GPG keys. That’s nuts.

Why should your trust in a package be based on if a “maintainer” or a “non-maintainer” released it? That’s a bad criteria. How can I trust 1.4.0? Perhaps somebody paid you a lot of money to insert a hole in it. I trust it if I trust _you_, not if I trust that you were listed as “maintainer” for a fragment of time.

I think you are confusing the maintainer field to mean something other than it does — which is simply the list of people authorized at some point in time to upload a package.

In the future, we can at first optionally, and then later on a stricter basis encourage and then enforce signing. I think this is a good idea.

But, and here we apparently disagree completely, it seems to me that everything else is not and should not be the job of a centralized server.

Now, on this count:

> we have no insight right now into how Hackage makes decisions about who's allowed to upload and revise packages.

This is weird. “Hackage” doesn’t make decisions. People do. Hackage is just a program, run on a machine. It enforces permissioning. Those permissions can be viewed. So I can tell you who the trustees are, who the admins are, and who the maintainers are for any given package. If any of that information about how these permissions are granted, by whom, and when, is not logged (and some that should be currently isn’t I’ll grant) then we can amend the codebase to log it. If we wish, we can make the log verifiable via an audit trail. We can also make admin actions verifiable. This is precisely what carter proposed (with my amendment).

On

> We have no idea how to make a correspondence between a Hackage username and some externally-verifiable identity (like a GPG public key). In that world: how can we externally verify signatures of packages on Hackage?

My proposal again is simple — treat the hackage username as a convenience, not as anything fundamental to the verification model. Treat verification entirely independently. Assume I have a GPG key for Michael Snoyman. How do I know a certain version of yesod is due to him? I don’t need to ask Hackage at all. I just check that he signed it with his key. That’s all that matters, right?

—Gershom

Mathieu Boespflug

unread,

Apr 15, 2015, 10:17:32 AM4/15/15

to Gershom B, Michael Snoyman, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, commerci...@googlegroups.com, haskell-inf...@community.galois.com, arn...@capital-match.com

> In the future, we can at first optionally, and then later on a stricter basis encourage and then enforce signing. I think this is a good idea.
>
> But, and here we apparently disagree completely, it seems to me that everything else is not and should not be the job of a centralized server.

Actually, I think you and Michael are in violent *agreement* on this
particular point. At the core of the gist that was pointed to earlier
in this thread [1], is the idea that we should have some kind of
central notepad, where anyone is allowed to scribble anything they
like, even add pointers to packages that are completely broken, don't
build, or are malicious trojan horses. Then, it's up to end users to
filter out the wheat from the chaff. In particular, it's up to the
user to pretend those scribbles that were added by untrusted sources
were just never there, *according to the users own trust model*. The
central notepad does not enforce any particular trust model. It just
provides sufficient mechanism so that the information necessary to
support common trust models, such as WoT of GPG keys, can be uploaded
and/or pointed to and found.

In this way, any trust model can be supported. We could refactor
Hackage on top of this notepad, and have Hackage upload metadata about
those scribbles that *it* thinks are legit, say because Hackage
performed the scribble itself on behalf of some user, but only did so
after authenticating said user, according to its own notion of
authentication.

Users are free to say "I trust any scribble to the notepad about any
package that was added by an authenticated Hackage user". Or "I only
trust scribbles from my Haskell friends whom I have met at ICFP and on
that occasion exchanged keys". Or a union of both. Or anything else
really.

[1] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

Mathieu Boespflug

unread,

Apr 15, 2015, 4:19:13 PM4/15/15

to Andrey Sverdlichenko, Gershom B, haskell-inf...@community.galois.com, arn...@capital-match.com, commerci...@googlegroups.com, haskel...@haskell.org

> Now, if I get it right, you want to allow anyone to upload foo-1.0.1
> to hackage and let user sort it out if he trusts this update. It will
> never work: for all we know about security, when asked "do you trust
> this package's signature?" user will either get annoyed, shrug and
> click "Yes", or if paranoid, get annoyed and go away. He is just does
> not know enough to make decisions you are asking him to make. And
> adding vector implementation with something malicious in it's build
> script just became a matter of "cabal upload".
> If you build such a system, you have to provide it with reasonable set
> of defaults, and it is where "we are in business of key distribution"
> thing raises its head again.

The above all sounds reasonable to me. Note however that, not that I'm
convinced that this is a good default, but the default here could well
be: "trust whatever was added to the notepad by Hackage on behalf of
some authenticated user". That gets us back to today's status quo. We
can do better of course, but just to say that this "notepad" can be
completely transparent to end users, depending only on tooling. One
way to know what is added by Hackage is to have Hackage sign whatever
it writes in the notepad, using some key or certificate that the
tooling trusts by default.

That's really the baseline. As I said, we can do much better. At least
with the central notepad approach, we're pushing policy about what
packages to trust down to end user tooling, which can be swapped in
and out at will, without having some central entity dictate a weaker
and redundant policy.

I agree with Gershom's sentiment that the policy for what to trust
should be left open. It should be left up to the user. One family of
policies, already discussed in this thread, is a GPG WoT. That family
of policies may or may not fly for all users, I don't know, but at
least the tooling for that already exists, and it's easy to put in
place. Another family is implementations of The Update Framework
suggested by Duncan here:

https://groups.google.com/d/msg/commercialhaskell/qEEJT2LDTMU/_uj0v5PbIA8J

I'm told others are working along similar lines. It'd be great if
those people came out of the woodwork to talk about what they're doing
in the open. And since clearly there's interest in safer package
distribution, formulate proposals with a comment about how it fares to
address a specific threat model, such as the 9 attacks listed in this
paper (same authors as links posted previously in this thread twice):

ftp://ftp.cs.arizona.edu/reports/2008/TR08-02.pdf

That list of attacks is a superset of the attacks listed by Michael
previously. Not all policies will address all attacks in a way that's
entirely satisfactory, but at least with the central notepad we can
easily evolve them over time.

Duncan Coutts

unread,

Apr 16, 2015, 5:34:04 AM4/16/15

to Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Hi folks,

As I mentioned previously on the commercialhaskell list, we're working
on Hackage security for the IHG at the moment.

We've finally written up the design for that as a blog post:

http://www.well-typed.com/blog/2015/04/improving-hackage-security

It includes a section at the end comparing in general terms to this
proposal (specifically Chris's part on package signing).

The design is basically "The Update Framework" for Hackage. Our current
implementation effort for the IHG covers the first part of that design.

http://theupdateframework.com/

I think TUF addresses many of the concerns that have been raised in this
thread, e.g. about threat models, what signatures actually mean etc.

It also covers the question of making the "who's allowed to upload what"
information transparent, with proper cryptographic evidence (albeit
that's in the second part of the design).

So if collectively we can also implement the second part of TUF for
Hackage then I think we can address these issues properly.

Other things worth noting:
* This will finally allow us to have untrusted public mirrors,
which is the traditional approach to improving repository
reliability.
* We're incorporating an existing design for incremental updates
of the package index to significantly improve "cabal update"
times.

I'll chip in elsewhere in this thread with more details about how TUF
(or our adaptation of it for hackage) solves some of the problems raised
here.

Duncan

Duncan Coutts, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/

Michael Snoyman

unread,

Apr 16, 2015, 5:53:01 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Thanks for responding, I intend to go read up on TUF and your blog post now. One question:

* We're incorporating an existing design for incremental updates
of the package index to significantly improve "cabal update"
times.

Can you give any details about what you're planning here? I put together a Git repo already that has all of the cabal files from Hackage and which updates every 30 minutes, and it seems that, instead of reinventing anything, simply using `git pull` would be the right solution here:

https://github.com/commercialhaskell/all-cabal-files

--
You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/1429176843.25663.31.camel%40dunky.localdomain.

Duncan Coutts

unread,

Apr 16, 2015, 6:12:51 AM4/16/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

On Thu, 2015-04-16 at 09:52 +0000, Michael Snoyman wrote:
> Thanks for responding, I intend to go read up on TUF and your blog post
> now. One question:
>
> * We're incorporating an existing design for incremental updates
> of the package index to significantly improve "cabal update"
> times.
>
> Can you give any details about what you're planning here?

Sure, it's partially explained in the blog post.

> I put together a
> Git repo already that has all of the cabal files from Hackage and which
> updates every 30 minutes, and it seems that, instead of reinventing
> anything, simply using `git pull` would be the right solution here:
>
> https://github.com/commercialhaskell/all-cabal-files

It's great that we can mirror to lots of different formats so
easily :-).

I see that we now have two hackage mirror tools, one for mirroring to a
hackage-server instance and one for S3. The bit I think is missing is
mirroring to a simple directory based archive, e.g. to be served by a
normal http server.

From the blog post:

The trick is that the tar format was originally designed to be
append only (for tape drives) and so if the server simply
updates the index in an append only way then the clients only
need to download the tail (with appropriate checks and fallback
to a full update). Effectively the index becomes an append only
transaction log of all the package metadata changes. This is
also fully backwards compatible.

The extra detail is that we can use HTTP range requests. These are
supported on pretty much all dumb/passive http servers, so it's still
possible to host a hackage archive on a filesystem or ordinary web
server (this has always been a design goal of the repository format).

We use a HTTP range request to get the tail of the tarball, so we only
have to download the data that has been added since the client last
fetched the index. This is obviously much much smaller than the whole
index. For safety (and indeed security) the final tarball content is
checked to make sure it matches up with what is expected. Resetting and
changing files earlier in the tarball is still possible: if the content
check fails then we have to revert to downloading the whole index from
scratch. In practice we would not expect this to happen except when
completely blowing away a repository and starting again.

The advantage of this approach compared to others like rsync or git is
that it's fully compatible with the existing format and existing
clients. It's also in the typical case a smaller download than rsync and
probably similar or smaller than git. It also doesn't need much new from
the clients, they just need the same tar, zlib and HTTP features as they
have now (e.g. in cabal-install) and don't have to distribute
rsync/git/etc binaries on other platforms (e.g. windows).

That said, I have no problem whatsoever with there being git or rsync
based mirrors. Indeed the central hackage server could provide an rsync
point for easy setup for public mirrors (including the package files).

Michael Snoyman

unread,

Apr 16, 2015, 6:32:08 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

I don't like this approach at all. There are many tools out there that do a good job of dealing with incremental updates. Instead of using any of those, the idea is to create a brand new approach, implement it in both Hackage Server and cabal-install (two projects that already have a massive bug deficit), and roll it out hoping for the best. There's no explanation here as to how you'll deal with things like cabal file revisions, which are very common these days and seem to necessitate redownloading the entire database in your proposal.

Here's my proposal: use Git. If Git isn't available on the host, then revert to the current codepath and download the index. We can roll that out in an hour of work and everyone gets the benefits, without the detriments of creating a new incremental update framework.

Also: it seems like your biggest complaint about Git is "distributing Git." Making Git an optional upgrade is one way of solving that. Another approach is: don't use the official Git command line tool, but one of the many other implementations out there that implement the necessary subset of functionality. I'd guess writing that functionality from scratch in Cabal would be a comparable amount of code to what you're proposing.

Comments on package signing to be continued later, I haven't finished reading it yet.

Michael

Duncan Coutts

unread,

Apr 16, 2015, 6:57:57 AM4/16/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

I looked at other incremental HTTP update approaches that would be
compatible with the existing format and work with passive http servers.
There's one rsync-like thing over http but the update sizes for our case
would be considerably larger than this very simple "get the tail, check
the secure hash is still right". This approach is minimally disruptive,
compatible with the existing format and clients.

> There's no explanation here as to how you'll deal with things like
> cabal file revisions, which are very common these days and seem to
> necessitate redownloading the entire database in your proposal.

The tarball becomes append only. The tar format works in this way;
updated files are simply appended. (This is how incremental backups to
tape drives worked in the old days, using the tar format). So no, cabal
file revisions will be handled just fine, as will other updates to other
metadata. Indeed we get the full transaction history.

> Here's my proposal: use Git. If Git isn't available on the host, then
> revert to the current codepath and download the index. We can roll that out
> in an hour of work and everyone gets the benefits, without the detriments
> of creating a new incremental update framework.

I was not proposing to change the repository format significantly (and
only in a backwards compatible way). The existing format is pretty
simple, using standard old well understood formats and protocols with
wide tool support.

The incremental update is fairly unobtrusive. Passive http servers don't
need to know about it, and clients that don't know about it can just
download the whole index as they do now.

The security extensions for TUF are also compatible with the existing
format and clients.

Michael Snoyman

unread,

Apr 16, 2015, 7:18:32 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

The theme you seem to be creating here is "compatible with current format." You didn't say it directly, but you've strongly implied that, somehow, Git isn't compatible with existing tooling. Let me make clear that that is, in fact, false[1]:

```

#!/bin/bash

set -e

set -x

DIR=$HOME/.cabal/packages/hackage.haskell.org

TAR=$DIR/00-index.tar

TARGZ=$TAR.gz

git pull

mkdir -p "$DIR"

rm -f $TAR $TARGZ

git archive --format=tar -o "$TAR" master

gzip -k "$TAR"

```

I wrote this in 5 minutes. My official proposal is to add code to `cabal` which does the following:

1. Check for the presence of the `git` executable. If not present, download the current tarball

2. Check for existence of ~/.cabal/all-cabal-files (or similar). If present, run `git pull` inside of it. If absent, clone it

3. Run the equivalent of the above shell script to produce the 00-index.tar file (not sure if the .gz is also used by cabal)

This seems like such a drastically simpler solution than using byte ranges, modifying Hackage to produce tarballs in an append-only manner, and setting up cabal-install to stitch together and check various pieces of a downloaded file.

I was actually planning on proposing this some time next week. Can you tell me the downsides of using Git here, which seems to fit all the benefits you touted of:

> pretty simple, using standard old well understood formats and protocols with wide tool support.

Unless Git at 10 years old isn't old enough yet.

Michael

[1] https://github.com/commercialhaskell/all-cabal-files/commit/133cd026f8a1f99d719d97fcf884372ded173655

Duncan Coutts

unread,

Apr 16, 2015, 7:58:44 AM4/16/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

On Thu, 2015-04-16 at 11:18 +0000, Michael Snoyman wrote:
> On Thu, Apr 16, 2015 at 1:57 PM Duncan Coutts <dun...@well-typed.com> wrote:

> > I was not proposing to change the repository format significantly (and
> > only in a backwards compatible way). The existing format is pretty
> > simple, using standard old well understood formats and protocols with
> > wide tool support.
> >
> > The incremental update is fairly unobtrusive. Passive http servers don't
> > need to know about it, and clients that don't know about it can just
> > download the whole index as they do now.
> >
> > The security extensions for TUF are also compatible with the existing
> > format and clients.
> >
> The theme you seem to be creating here is "compatible with current format."
> You didn't say it directly, but you've strongly implied that, somehow, Git
> isn't compatible with existing tooling. Let me make clear that that is, in
> fact, false[1]:

Sure, one can use git or rsync or other methods to transfer the set of
files that makes up a repository or repository index. The point is,
existing clients expect both this format and this (http) protocol.

There's a number of other minor arguments to be made here about what's
simpler and more backwards compatible, but here are two more significant
and positive arguments:

1. This incremental update approach works well with the TUF
security design
2. This approach to transferring the repository index and files has
a much lower security attack surface

For 1, the basic TUF approach is based on a simple http server serving a
set of files. Because we are implementing TUF for Hackage we picked this
update method to go with it. It's really not exotic, the HTTP spec says
about byte range requests: "Range supports efficient recovery from
partially failed transfers, and supports efficient partial retrieval of
large entities." We're doing an efficient partial retrieval of a large
entity.

For 2, Mathieu elsewhere in this thread pointed to an academic paper
about attacks on package repositories and update systems. A surprising
number of these are attacks on the download mechanism itself, before you
even get to trying to verify individual package signatures. If you read
the TUF papers you see that they also list these attacks and address
them in various ways. One of them is that the download mechanism needs
to know in advance the size (and content hash) of entities it is going
to download. Also, we should strive to minimise the amount of complex
unaudited code that has to run before we get to checking the signature
of the package index (or individual package tarballs). In the TUF
design, the only code that runs before verification is downloading two
files over HTTP (one that's known to be very small, and the other we
already know the length and signed content hash). If we're being
paranoid we shouldn't even run any decompression before signature
verification. With our implementation the C code that runs before
signature verification is either none, or just zlib decompression if we
want to do on-the-fly http transport compression, but that's optional if
we don't want to trust zlib's security record (though it's extremely
widely used). By contrast, if we use rsync or git then there's a massive
amount of unaudited C code that is running with your user credentials
prior to signature verification. In addition it is likely vulnerable to
endless data and slow download attacks (see the papers).

Duncan Coutts

unread,

Apr 16, 2015, 8:02:34 AM4/16/15

to Gershom B, commerci...@googlegroups.com, Carter Schonwald, haskel...@haskell.org, den...@deathbytape.com, haskell-inf...@community.galois.com, arn...@capital-match.com

On Wed, 2015-04-15 at 00:07 -0400, Gershom B wrote:
> So I want to focus just on the idea of a “trust model” to hackage
> packages.

Good. I think TUF has a good answer here.

> Now, how does security fit into this? Well, at the moment we can
> prevent packages from being uploaded by people who are not authorized.
> And whoever is authorized is the first person who uploaded the
> package, or people they delegate to, or people otherwise added by
> hackage admins via e.g. the orphaned package takeover process.

As Michael rightly points out, though the hackage server does this, it
doesn't generate any cryptographic evidence for it. TUF solves that part
with its "target key delegation" information. It's the formal metadata
for who is allowed to upload what. So if we implement this part of TUF
then we no longer have to rely on the hackage server not getting hacked
to ensure this bit.

[...]
> that attempts a _much simpler_ guarantee — that e.g. the person who
> signed a package as being “theirs” is either the same person that
> signed the prior version of the package, or was delegated by them (or
> hackage admins).

That's what TUF's target key system provides. There's a target key held
by the hackage admins (and signed by the root keys) that is used to sign
individual author keys and delegation information to say that this key
is allowed to sign this package.

So it's not a guarantee that the package is good, or that the author is
a sensible person, but it is formal evidence that that person should be
in the maintainer group for that package.

Then because TUF makes it this relatively lightweight it's fully
automatic for end users because the chain (not web) of trust is trivial.

> In my mind, the key elements of such a system are that it is
> orthogonal to how code is distributed and that it is opt-in/out.

Yes, our TUF adaptation for Hackage includes the author keys being
optional (and TUF is designed to be adapted in this way). Once you
opt-in for a package then the delegation information makes clear to
clients that they must expect to see an individual package signature. So
you can have a mixture of author-signed packages and not, without
downgrade attacks. The target key delegation information makes it clear.

Michael Snoyman

unread,

Apr 16, 2015, 8:18:40 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

I never claimed nor intended to imply that range requests are non-standard. In fact, I'm quite familiar with them, given that I implemented that feature of Warp myself! What I *am* claiming as non-standard is using range requests to implement an incremental update protocol of a tar file. Is there any prior art to this working correctly? Do you know that web servers will do what you need and server the byte offsets from the uncompressed tar file instead of the compressed tar.gz? Where are you getting the signatures for, and how does this interact with 00-index.tar.gz files served by non-Hackage systems?

On the security front: it seems that we have two options here:

1. Use a widely used piece of software (Git), likely already in use by the vast majority of people reading this mailing list, relied on by countless companies and individuals, holding source code for the kernel of likely every mail server between my fingertips and the people reading this email, to distribute incremental updates. And as an aside: that software has built in support for securely signing commits and verifying those signatures.

2. Write brand new code deep inside two Haskell codebases with little scrutiny to implement a download/update protocol that (to my knowledge) has never been tested anywhere else in the world.

Have I misrepresented the two options at all?

I get that you've been working on this TUF-based system in private for a while, and are probably heavily invested already in the solutions you came up with in private. But I'm finding it very difficult to see the reasoning to reinventing wheels that need to reinventing.

MIchael

Mathieu Boespflug

unread,

Apr 16, 2015, 8:39:31 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

I'd like to step back from the technical discussion here for a moment
and expand a bit on a point at the end of my previous email, which is
really about process.

After I first uploaded a blog post about service architectures and
package distribution that was a recent interest of mine, I was very
surprised and happy to hear that actually several parties had not only
been already thinking about these very topics but moreover already
have various small prototypes lying around. This was also the case for
*secure* package distribution. What puzzled me, however, is that this
came in the form of multiple private messages from mutiple sources
sometimes referring to multiple said parties only vaguely and without
identifying them. A similar story occurred when folks first started
evoking package signing some years ago.

Be it on robust identification of the provenance of packages,
distribution packages and their metadata, more robust sandboxes or any
other topic that touches upon our core infrastructure and tooling, it
would be really great if people made themselves known and came forth
with a) the requirements they seek to work against, b) their ideas to
solve them and c) the resources they need or are themselves willing to
bring to bear.

It ultimately hurts the community when people repeatedly say things to
the effect of, "yep, I hear you, interesting topic, I have a really
cool solution to all of what you're saying - will be done Real Soon
Now(tm)", or are happy to share details but only within a limited
circle of cognoscenti. Because the net result is that other interested
parties either unknowingly duplicate effort, or stall thinking that
others are tackling the issue, sometimes for years.

I know that the IHG has been interested in more secure package
distribution for a very long time now, so it's really great that
Duncan and Austin have now ("finally") taken the time to write up
their current plan, moreover with a discussion of how it addresses a
specific threat model, and make it known to the rest of the community
that they have secured partial funding from the IHG. I know there
other efforts out there, it would be great if they all came out of the
woodwork. And in the future, if we could all be mindful to *publish*
proposals and intents *upfront* when it comes to our shared community
infrastructure and community tooling (rather than months or years
later). I believe that's what is at the core of an *open* process for
community developments.

Ok, end of meta point, I for one am keen to dive back into the
technical points that have been brought up in this thread already. :)

Gershom B

unread,

Apr 16, 2015, 11:07:03 AM4/16/15

to Duncan Coutts, Mathieu Boespflug, haskell-inf...@community.galois.com, commerci...@googlegroups.com, Haskell Cafe

On April 16, 2015 at 8:39:40 AM, Mathieu Boespflug (mb...@tweag.net) wrote:

> It ultimately hurts the community when people repeatedly say things to
> the effect of, "yep, I hear you, interesting topic, I have a really
> cool solution to all of what you're saying - will be done Real Soon
> Now(tm)", or are happy to share details but only within a limited
> circle of cognoscenti. Because the net result is that other interested
> parties either unknowingly duplicate effort, or stall thinking that
> others are tackling the issue, sometimes for years.

I think this is a valid concern. Let me make a suggestion as to why this does not happen as much as we might like as well (other than not-enough-time which is always a common reason). Knowing a little about different people’s style of working on open source projects, I have observed that some people are keen to throw out lots of ideas and blog while their projects are in the very early stages of formation. Sometimes this leads to useful discussions, sometimes it leads to lots of premature bikeshedding. But, often, other people don’t feel comfortable throwing out what they know are rough and unfinished thoughts to the world. They would rather either polish the proposal more fully, or would like to have a sufficient proof-of-concept that they feel confident the idea is actually tractable. I do not mean to suggest one or the other style is “better” — just that these are different ways that people are comfortable working, and they are hardwired rather deeply into their habits.

In a single commercial development environment, these things are relatively more straightforward to mediate, because project plans are often set top down, and there are in fact people whose job it is to amalgamate information between different developers and teams. In an open source community things are necessarily looser. There are going to be a range of such styles and approaches, and while it is sort of a pain to negotiate between all of them, I don’t really see an alternative.

So let me pose the opposite thing too: if there is a set of concerns/ideas involving core infrastructure and possible future plans, it would be good to reach out to the people most involved with that work and check if they have any projects underway but perhaps not widely announced that you might want to be aware of. I know that it feels it would be better to have more frequent updates on what projects are kicking around and what timetables. But contrariwise, it also feels it would be better to have more people investigate more as they start to pursue such projects.

Also, it is good to have different proposals on the table, so that we can compare them and stack up what they do and don’t solve more clearly. So, to an extent, I welcome duplication of proposals as long as the discussion doesn’t fragment too far. And it is also good to have a few proofs-of-concept floating about to help pin down the issues better. All this is also very much in the open source spirit.

One idea I have been thinking about, is a Birds of a Feather meeting at the upcoming ICFP in Vancouver focused just on Haskell Open-Source Infrastructure. That way a variety of people with a range of different ideas/projects/etc. could all get together in one room and share what they’re worried about and what they’re working on and what they’re maybe vaguely contemplating on working on. It’s great to see so much interest from so many quarters in various systems and improvements. Now to try and facilitate a bit more (loose) coordination between these endeavors!

Cheers,
Gershom

P.S. as a general point to bystanders in this conversation — it seems to me one of the best ways to help the pace of “big ticket” cabal/hackage-server work would be to take a look at their outstanding lists of tracker issues and see if you feel comfortable jumping in on the smaller stuff. The more we can keep the little stuff under control, the better for the developers as a whole to start to implement more sweeping changes.

Michael Snoyman

unread,

Apr 16, 2015, 11:28:12 AM4/16/15

to Duncan Coutts, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Minor update. Some of your points about checking signatures before unpacking made me curious about what Git had to offer in these circumstances. For those like me who were unaware of the functionality, it turns out that Git has the option to reject non-signed commits, just run:

git pull --verify-signatures

I've set up the Travis job that pulls from Hackage to sign its commits with the GPG key I've attached to this email (fingerprint E595 AD42 14AF A6BB 1552 0B23 E40D 74D6 D6CF 60FD).

--
You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/1429185521.25663.103.camel%40dunky.localdomain.

all-cabal-files.asc.txt

Duncan Coutts

unread,

Apr 16, 2015, 12:02:39 PM4/16/15

to Mathieu Boespflug, Haskell Cafe, commerci...@googlegroups.com

On Thu, 2015-04-16 at 14:39 +0200, Mathieu Boespflug wrote:
> I'd like to step back from the technical discussion here for a moment
> and expand a bit on a point at the end of my previous email, which is
> really about process.

I should apologise for not publishing our design earlier. To be fair I
did mention several times on the commercialhaskell mailing list earlier
this year that we were working on an index signing based approach.

Early on in the design process we did not appreciate how much TUF
overlaps with a GPG-author-signing based approach, we had thought they
were much more orthogonal.

My other excuse is that I was on holiday while much of the recent design
discussion on Chris and your proposals had been going on.

And finally, writing up comprehensible explanations is tricky and time
consuming.

By ultimately these are just excuses. We do always intend to do things
openly in a collaborative way, the Cabal and hackage development is
certainly open in that way, and we certainly never hold things back as
closed source. In this case Austin and I have been doing intensive
design work, and it was easier for us to do that between ourselves
initially given that we're doing it on work time. I accept that we
should have got this out earlier, especially since it turns out the
other designs do have some overlap in terms of goals and guarantees.

> Ok, end of meta point, I for one am keen to dive back into the
> technical points that have been brought up in this thread already. :)

Incidentally, having read your post on splitting things up a bit when I
got back from holiday, I agree there are certainly valid complaints
there. I'm not at all averse to factoring the hackage-server
implementation slightly differently, perhaps so that the core index and
package serving is handled by a smaller component (e.g. a dumb http
server). For 3rd party services, the goal has always been for the
hackage-server impl to provide all of its data in useful formats. No
doubt that can be improved. Pull requests gratefully accepted.

I see this security stuff as a big deal for the reliability because it
will allow us to use public untrusted mirrors. That's why it's important
to cover every package. That and perhaps a bit of refactoring of the
hackage server should give us a very reliable system.

Mathieu Boespflug

unread,

Apr 16, 2015, 4:40:01 PM4/16/15

to Gershom B, Duncan Coutts, haskell-inf...@community.galois.com, commerci...@googlegroups.com, Haskell Cafe

Thank you for that Gershom. I think everything that you're saying in
that last email is very much on the mark. Multiple proposals is
certainly a good thing for diversity, a good thing to help take our
infrastructure in a good direction, and a good thing to help it evolve
over time. It's true that most of us are volunteer contributors,
working on improving infrastructure only so long as it's fun. So it's
not always easy to ask for more upfront clarity and pitch perfect
coordination. Then again as a community we make more progress faster
when a little bit of process is followed.

While a millions different tools or libraries to do the same thing can
coexist just fine, with infrastructure that's much more difficult. A
single global view of all code that people choose to contribute as
open source is much healthier than a fragmented set of sub communities
each working with their own infrastructure. So the degree of
coordination required to make infrastructure evolve is much higher. To
this end, I'd like to strongly encourage all interested parties to
publish into the open proposals covering one or both of the topics
that are currently hot infrastructure topics in the community:

1. reliable and efficient distribution of package metadata, package
content and of incremental updates thereof.
2. robust and convenient checking of the provenance of a package
version and policies for rejecting such package versions as
potentially unsafe.

These two topics overlap of course, so as has been the case so far
often folks will be addressing both simultaneously. I submit that it
would be most helpful if these proposals were structured as follows:

* Requirements addressed by the proposal (including *thread model*
where relevant)
* Technical details
* Ideally, some indication of the resources needed and a timeline.

I know that the last point is of particular interest to commercial
users, who like predictability in order to decide whether or not they
need to be chipping in their own meagre resources to make the proposal
happen and happen soon. But to some extent so does everyone else: no
one likes to see the same discussions drag on for 2+ years. Openness
really helps here - if things end up dragging out others can pick up
the baton where it was left lying.

So far we have at least 2 proposals that cover at least the first two
sections above:

* Chris Done's package signing proposal:
https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-proposal
* Duncan Coutts and Austin Seipp's proposal for improving Hackage
security: http://www.well-typed.com/blog/2015/04/improving-hackage-security/

There are other draft (or "strawman") proposals (including one of
mine) floating around out there, mentioned earlier in this thread. And
then some (including prototype implementations) that I can say I and
others have engaged with via private communication, but it would
really help this discussion move forward if they were public.

> One idea I have been thinking about, is a Birds of a Feather meeting at the upcoming ICFP in Vancouver focused just on Haskell Open-Source Infrastructure.

I think that's a great idea.

Best,

Mathieu

Mathieu Boespflug

unread,

Apr 16, 2015, 5:03:02 PM4/16/15

to Duncan Coutts, Haskell Cafe, commerci...@googlegroups.com

> Incidentally, having read your post on splitting things up a bit when I
> got back from holiday, I agree there are certainly valid complaints
> there. I'm not at all averse to factoring the hackage-server
> implementation slightly differently, perhaps so that the core index and
> package serving is handled by a smaller component (e.g. a dumb http
> server). For 3rd party services, the goal has always been for the
> hackage-server impl to provide all of its data in useful formats. No
> doubt that can be improved. Pull requests gratefully accepted.

Awesome. Sounds like we're in broad agreement.

> I see this security stuff as a big deal for the reliability because it
> will allow us to use public untrusted mirrors. That's why it's important
> to cover every package. That and perhaps a bit of refactoring of the
> hackage server should give us a very reliable system.

Indeed - availability by both reliability and redundancy. I still have
some catching up to do on the technical content of your proposal and
others - let me comment on that later. But either way I can certainly
agree with the goal of reducing the size of the trusted base while
simultaneously expanding the number of points of distribution.

In the meantime, mirrors already exist (e.g.
http://hackage.fpcomplete.com/), but as you say, they need to be
trusted, in addition to having to trust Hackage.

Thanks again for your detailed blog post and the context it provides.

Best,

Mathieu

Mathieu Boespflug

unread,

Apr 28, 2015, 5:09:22 PM4/28/15

to ra...@twistedmatrix.com, haskell-inf...@community.galois.com, Haskell-cafe Cafe, sp...@scientician.net, mic...@snoyman.com, commerci...@googlegroups.com

[removing erroneous haskel...@googlegroups.com from To list.]

On 28 April 2015 at 23:07, Mathieu Boespflug <mb...@tweag.net> wrote:
> This is a valid concern. One that I should have addressed explicitly
> in the proposal. Git is fairly well supported on Windows these days
> and installs easily. It could conceivably be included as part of
> MinGHC. There are many alternatives, but I doubt we'll need them:
> statically linking a C implementation (libgit2 or another), or a
> simple native implementation of the git protocol (the protocol is
> quite straightforward and is documented) and basic disk format.
>
> The same is true about GnuPG, via gpg4win, though note that under this
> proposal GnuPG wouldn't be a requirement for `cabal update` to work.
> Just an additional optional dependency which you'll want to have
> installed if you want to protect yourself from the attacks listed in
> the proposal.
>
> By the way, one side note about this Git proposal: it sides steps the
> discussion around how to add SSL support to cabal-install entirely.
> Since Git understands (among others) HTTPS natively, so we can
> outsource our support for that to Git. In any case SSL no longer
> becomes a necessity for protecting against MITM (the commit signing
> takes care of that), only a nice-to-have for privacy.
>
> On 28 April 2015 at 19:46, <ra...@twistedmatrix.com> wrote:
>> To me, the elephant in the room is how the dependency on Git will be
>> handled. I'm not a Windows user, but how much more painful will it be to set
>> up a Haskell environment on Windows with a new dependency on Git? Will users
>> need to install it separately, or do you suggest embedding Git into the
>> relevant tools? Should the Haskell Platform bundle it? What about MinGHC?
>> Oh, and I guess the same question can be asked about GnuPG.
>>
>> I personally use a Mac and Homebrew, so it's pretty easy for me to install
>> those dependencies, and I'm sure the same is true on Linux. But also, not
>> everyone uses Homebrew (in fact, I'm sure most programmers on Macs don't use
>> it), so it's also worth considering whether the requisite tools should be
>> embedded in the "GHC for Mac OS X" distribution.
>>
>> On Linux this probably isn't an issue because pretty much everyone has a
>> decent dependency-tracking package manager.
>>
>> I don't know if you care personally about these issues, but I think any
>> proposal which introduces new dependencies to the core development
>> environment of Haskell should take it into consideration. Very few people
>> have Git and GPG already installed, and I think the new-user experience
>> should be considered, and I'm surprised nobody has mentioned it in this
>> entire thread (unless I missed it).
>>
>> -- radix (Christopher Armstrong)
>>
>> P.S. I'm very excited to see this work, including the emphasis on using the
>> well-researched TUF. Thanks to you and other people working on this. :)
>>
>> On Tuesday, April 28, 2015 at 5:07:56 AM UTC-5, Mathieu Boespflug wrote:
>>>
>>> Hi all,
>>>
>>> last week, I found some time to write up a very simple proposal that
>>> addresses the following goals simultaneously:
>>>
>>> - maintain a difficult to forge public audit log of Hackage updates;
>>> - make downloads from Hackage mirrors just as trustworthy as
>>> downloading from Hackage itself;
>>> - guarantee that `cabal update` is always pulling the freshest package
>>> index (called "snapshots" in the proposal), and detect when this might
>>> not be the case;
>>> - implement the first half of TUF (namely the index signing part
>>> discussed in Duncan's blog post, not the author package signing part)
>>> with fewer metadata files and in a way that reuses existing tooling;
>>> - get low-implementation-cost, straightforward and incremental `cabal
>>> update`.
>>>
>>> After a preliminary review from a few colleagues and friends in the
>>> community, here is the proposal, in the form of Commercial Haskell
>>> wiki page:
>>>
>>>
>>> https://github.com/commercialhaskell/commercialhaskell/wiki/Git-backed-Hackage-index-signing-and-distribution
>>>
>>> The design constraints here are:
>>>
>>> - stay backwards compatible where the cost for doing so is low.
>>> - reuse existing tooling and mechanisms, especially when it comes to
>>> key management, snapshot identity, and distributing signatures.
>>> - Focus on the above 5 goals only, because they happen to all be
>>> solvable by changing a single piece of mechanism. But strive to reuse
>>> whatever mechanism others are proposing to solve other goals (e.g.
>>> certification of provenance using author package signing, as Chris
>>> Done has already proposed).
>>>
>>> To that effect, the tl;dr is that I'm proposing that we just use Git
>>> for maintaining the Hackage package index, that we use Git for
>>> synchronizing this locally, and that we use Git commit signatures for
>>> implementing the first half of TUF. The Git tooling currently assumes
>>> GnuPG keys for signatures, so I'm proposing that we use GnuPG keys for
>>> signing, and that we manage key revocation and any trust delegation
>>> between keys using GnuPG and its existing infrasture.
>>>
>>> I estimate the total effort necessary here to be the equivalent of 5-6
>>> full time days overall. However, I have not pooled the necessary
>>> resources to carry that out yet. I'd like to get feedback first before
>>> going ahead with this, but in meantime,
>>>
>>> ** if there are any volunteers that would like to signal their intent
>>> to help with the implementation effort then please add your name at
>>> the bottom of the wiki page. **
>>>
>>> Best,
>>>
>>> Mathieu
>>>
>>> On 18 April 2015 at 20:11, Michael Snoyman <mic...@snoyman.com> wrote:
>>> >
>>> >
>>> > On Sat, Apr 18, 2015 at 12:20 AM Bardur Arantsson
>>> > <sp...@scientician.net>
>>> > wrote:
>>> >>
>>> >> On 17-04-2015 10:17, Michael Snoyman wrote:
>>> >> > This is a great idea, thank you both for raising it. I was discussing
>>> >> > something similar with others in a text chat earlier this morning.
>>> >> > I've
>>> >> > gone ahead and put together a page to cover this discussion:
>>> >> >
>>> >> >
>>> >> >
>>> >> > https://github.com/commercialhaskell/commercialhaskell/blob/master/proposal/improved-hackage-security.md
>>> >> >
>>> >> > The document definitely needs more work, this is just meant to get
>>> >> > the
>>> >> > ball
>>> >> > rolling. As usual with the commercialhaskell repo, if anyone wants
>>> >> > edit
>>> >> > access, just request it on the issue tracker. Or most likely, send a
>>> >> > PR
>>> >> > and
>>> >> > you'll get a commit bit almost magically ;)
>>> >>
>>> >> Thank you. Just to make sure that I understand -- is this page only
>>> >> meant to cover the original "strawman proposal" at the start of this
>>> >> thread, or...?
>>> >>
>>> >> Maybe you intend for this to be extended in a detailed way under the
>>> >> "Long-term solutions" heading?
>>> >>
>>> >> I was imagining a wiki page which could perhaps start out by collecting
>>> >> all the currently identified possible threats in a table, and then all
>>> >> "participants" could perhaps fill in how their suggestion addresses
>>> >> those threats (or tell us why we shouldn't care about this particular
>>> >> threat). Of course other relevent non-threat considerations might be
>>> >> relevant to add to such a table, such as: how prevalent is the
>>> >> software/idea we're basing this on? does this have any prior
>>> >> implementation (e.g. the append-to-tar and expect that web servers will
>>> >> behave sanely thing)? etc.
>>> >>
>>> >> (I realize that I'm asking for a lot of work, but I think it's going to
>>> >> be necessary, at least if there's going to be consensus and not just a
>>> >> de-facto "winner".)
>>> >>
>>> >>
>>> >
>>> > Hi Bardur,
>>> >
>>> >
>>> > I don't think I have any different intention for this page than you've
>>> > identified. In fact, I thought that I had clearly said exactly what you
>>> > described when I said:
>>> >
>>> >> There are various ideas at play already. The bullets are not intended
>>> >> to
>>> >> be full representations of the proposals, but rather high level
>>> >> summaries.
>>> >> We should continue to expand this page with more details going forward.
>>> >
>>> > If this is unclear somehow, please tell me. But my intention absolutely
>>> > is
>>> > that many people can edit this page to add their ideas and we can flesh
>>> > out
>>> > a complete solution.
>>> >
>>> > Michael
>>> >
>>> > _______________________________________________
>>> > Haskell-Cafe mailing list
>>> > Haskel...@haskell.org
>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> >
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskel...@haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Bardur Arantsson

unread,

Apr 29, 2015, 6:35:07 AM4/29/15

to commerci...@googlegroups.com, haskell-inf...@community.galois.com, haskel...@haskell.org

On 28-04-2015 23:09, Mathieu Boespflug wrote:
> [removing erroneous haskel...@googlegroups.com from To list.]

(I'm not the person you're responing to. From the mail-headers, I can't
see the person(s) you're responding to, but so be it.)

Do you have evidence that your approach is superior, and could you
please cite it? [Or, alternatively provide negative evidence for
$OTHER_APPROACH.])

Regards,

Mathieu Boespflug

unread,

Apr 29, 2015, 10:49:36 AM4/29/15

to Bardur Arantsson, commerci...@googlegroups.com, haskell-inf...@community.galois.com, haskel...@haskell.org

Define "superior"?

As argued in the proposal, the salient features are that by devolving practically everything to Git+GPG, we end up with less code to maintain in our tooling, less code to maintain in our infrastructure (namely hackage-server), a more reliable service, and a smaller chance of buggering up security related activities (such as signing and managing trust).

We're not introducing dependencies on dynamically linked system libraries that makes tooling hard to distribute. We're not asking users to install anything new that isn't already a staple of most developer desktops, and not asking users, Hackage trustees and Hackage admins to manage new identities with new key formats that aren't the existing ones they already have (namely GnuPG). Further, users can still opt-out of signature verification if they want to.

Compared to alternative approaches - there has been a proposal to get incremental updates à la Git differently by growing (potentially infinitely) the end of a tar file served by the server via HTTP. This means grabbing the history for new package revisions cannot be opted out from easily. With Git, you get this for free, since users can `git clone --depth=1` and still be able to do a `git pull` later and verify signatures. You further get the advantage of being able to directly mine the history of changes, using standard tools, something that can't be done directly on the tar file without more custom tooling (or post conversion to Git).

--
You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/mhouaf%24it2%241%40ger.gmane.org.

Blake Rain

unread,

May 3, 2015, 7:09:14 AM5/3/15

to Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

Hi All,

I have only just found the time to read through this discussion. I thought perhaps I would offer a few thoughts.

It seems that we are all in agreement that the security of Hackage/Cabal is a problem: insecure transmission and no way to verify package authorship. This is something which I feel we must address:

Where I work we have a lot of compliance to consider, and a few products require us to provide vary degrees of assurance about the code we link against. This usually leads to the decision to use a third party piece of kit. Going forward, we will need to replace these systems with our own solutions. I would prefer to use Haskell, but swapping out Hackage/Cabal due to security concerns is undesirable from my point of view and the lack of package security will be a show-stopper for senior management.

Using git and S3 as Michael suggests seems a good solution to me. To my mind, the increased transparency, ability to create mirrors of S3 and git to access the package metadata offers a number of desirable features.

Regarding the use of git: I don't think that we need to implement our own solution, and depending on git is not an issue: Most of our CI uses git anyway.

A final point I feel I must raise is that it seems that FP Complete are going to be footing the bill for the S3 hosting. Long term, this seems unfair to FP Complete. Is this something that the haskell.org could take on? Or at the very least some other mechanism to either pay for or offer compensation long term?

Kinds Regards,

- B.

On 13 April 2015 at 11:02, Michael Snoyman <mic...@snoyman.com> wrote:

Many of you saw the blog post Mathieu wrote[1] about having more composable community infrastructure, which in particular focused on improvements to Hackage. I've been discussing some of these ideas with both Mathieu and others in the community working on some similar thoughts. I've also separately spent some time speaking with Chris about package signing[2]. Through those discussions, it's become apparent to me that there are in fact two core pieces of functionality we're relying on Hackage for today:

* A centralized location for accessing package metadata (i.e., the cabal files) and the package contents themselves (i.e., the sdist tarballs)
* A central authority for deciding who is allowed to make releases of packages, and make revisions to cabal files

In my opinion, fixing the first problem is in fact very straightforward to do today using existing tools. FP Complete already hosts a full Hackage mirror[3] backed by S3, for instance, and having the metadata mirrored to a Git repository as well is not a difficult technical challenge. This is the core of what Mathieu was proposing as far as composable infrastructure, corresponding to next actions 1 and 3 at the end of his blog post (step 2, modifying Hackage, is not a prerequesite). In my opinion, such a system would far surpass in usability, reliability, and extensibility our current infrastructure, and could be rolled out in a few days at most.

However, that second point- the central authority- is the more interesting one. As it stands, our entire package ecosystem is placing a huge level of trust in Hackage, without any serious way to vet what's going on there. Attack vectors abound, e.g.:

* Man in the middle attacks: as we are all painfully aware, cabal-install does not support HTTPS, so a MITM attack on downloads from Hackage is trivial
* A breach of the Hackage Server codebase would allow anyone to upload nefarious code[4]
* Any kind of system level vulnerability could allow an attacker to compromise the server in the same way

Chris's package signing work addresses most of these vulnerabilities, by adding a layer of cryptographic signatures on top of Hackage as the central authority. I'd like to propose taking this a step further: removing Hackage as the central authority, and instead relying entirely on cryptographic signatures to release new packages.

I wrote up a strawman proposal last week[5] which clearly needs work to be a realistic option. My question is: are people interested in moving forward on this? If there's no interest, and everyone is satisfied with continuing with the current Hackage-central-authority, then we can proceed with having reliable and secure services built around Hackage. But if others- like me- would like to see a more secure system built from the ground up, please say so and let's continue that conversation.

[1] https://www.fpcomplete.com/blog/2015/03/composable-community-infrastructure
[2] https://github.com/commercialhaskell/commercialhaskell/wiki/Package-signing-detailed-propsal
[3] https://www.fpcomplete.com/blog/2015/03/hackage-mirror
[4] I don't think this is just a theoretical possibility for some point in the future. I have reported an easily trigerrable DoS attack on the current Hackage Server codebase, which has been unresolved for 1.5 months now
[5] https://gist.github.com/snoyberg/732aa47a5dd3864051b9

--

You received this message because you are subscribed to the Google Groups "Commercial Haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to commercialhask...@googlegroups.com.
To post to this group, send email to commerci...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CAKA2JgL4MviHic52_S3P8RqxyJndkj3oFA%2BPVG11AAgMhMJksw%40mail.gmail.com.

Carter Schonwald

unread,

May 3, 2015, 11:49:58 AM5/3/15

to Blake Rain, Michael Snoyman, Haskell Cafe, haskell-inf...@community.galois.com, commerci...@googlegroups.com

storage for every package ever released on hackage in the history of haskell on s3 totals < 30 cents per month (probably closer to 10 cents). If their s3 host had super high usage, the most it'd cost per month bandwidth is 50-90 dollars (and likely overestimating by at least 10x). a small engineering team probably spends more than that on coffee per month.

haskell.org hosting and infrastructure is largely donated by various organizations. If you can get AWS to top the rackspace+dreamhost infra sponsorship, Gershom and others would probably love to hear about it.

To view this discussion on the web visit https://groups.google.com/d/msgid/commercialhaskell/CANUq-hHWexCeL%2Bp%2BWSU7PNgpdpWo_j0uPJ47-ui7QRjdktZ9sg%40mail.gmail.com.

Reply all

Reply to author

Forward