Package verification

838 views
Skip to first unread message

Lachlan Gunn

unread,
Oct 28, 2015, 12:38:24 PM10/28/15
to julia-dev
Hello,

Searching about, I've not found much on the subject of package signatures except for this old issue here:


Is this dead at the moment, or would people be interested if I were to do some research into how some different approaches might work (e.g. git signed tags/commits, how various other package managers do it, etc.)?

Perhaps it's a bit paranoid at this stage, but this could be somewhat important in the long run---lots of engineering/scientific work is going to be a target, and attacking scientific computing software seems like a good way to zero in on machines with design data.

Thanks,
Lachlan

Lachlan Gunn

unread,
Nov 4, 2015, 6:33:45 PM11/4/15
to julia-dev
So I've had a look around and had a think about how things would work, and have written up my thoughts here:


To summarise, I think using Git's tag/commit signing mechanism is a no-go because it doesn't integrate well with the pull request mechanism, can't be verified if you copy the package out of the repository, and gives METADATA.jl committers both the power to modify all packages.  Since Git shows no signs of moving away from SHA-1, the security could be broken over the next few years, and there would be no easy way to transition to something better.

Probably the easiest solution to implement would be to add the signature functionality to PkgDev, and not rely on Git at all.  This way the signature can be included in the pull request, meaning you can do do automated tests like TravisCI and such do, and no-one who doesn't hold a permission-granting key can modify other people's packages.  The only extra work would be for the package author to enter their passphrase before commit.  There are some issues to overcome, such as replay attacks---an attacker modifying METADATA.jl to point to an old version with a security bug---but nothing insurmountable I don't think.

Anyway, that's my two cents, I might have a go at building a proof of concept with all of the signatures out-of-tree when I get a free moment.  I'd be interested to hear other people's opinions.

Thanks,
Lachlan

Stefan Karpinski

unread,
Nov 5, 2015, 9:40:10 AM11/5/15
to juli...@googlegroups.com
This is a great document – I'd like to move to julep system where people write comprehensive documents like this in the future. I buy the argument that we should have an external hashing and signing system rather than relying on git alone. Git SHA1 hashes still can serve as an additional layer of protection but aren't sufficient alone. We should definitely switch to getting metadata and packages over https regardless of anything else as a matter of defense in depth.

I do think we need a Debian-like trust system but I don't like the idea of a single signer per package. It also doesn't help if the official owner of a package is the attacker. So we would need a system where maintainers can vouch for how much they trust other people and how much they trust their keys and those installing packages would have a set of people they trust and trust in versions of packages would be derived from that network. This means that someone who is not the "owner" of a package could inspect a version and check that it doesn't introduce a security vulnerability as far as they can tell and sign it. It may be worth distinguishing inspection of changes from inspection of versions: the former is much easier – just look at a diff – while the latter requires auditing the whole package, but we're much likelier to get people to look at diffs and to some extend that suffice if combined with periodic full review of the code base.

Lachlan Gunn

unread,
Nov 5, 2015, 10:19:44 AM11/5/15
to juli...@googlegroups.com
Thanks, your praise is much appreciated.  My thinking is that no matter what you do, a sufficiently determined package maintainer will be able to put malicious code in, but what you say is pretty reasonable.  That said, I don't think auditing all packages is going to be scalable, however it would be nice to be able to cryptographically certify that certain automated checks have been made, even if we don't do so for now.  But I wouldn't require it in general...can you imagine if Python or CPAN did so?  They would grind to a standstill.

We could probably make something reasonably streamlined by maintaining an external web application that talks to PkgDev and GitHub in order to facilitate the maintainer throwing signatures around without having to upload a bunch of signature files to the pull request, but it breaches one of the principles that I didn't explicitly state in the document, which is that it is desirable to be able to submit package revisions without having back-and-forth between the maintainer, since a single round immediately doubles the amount of work involved.

Crypto-ing the audit process like that _would_ be good, though, if we were to start moving core packages into METADATA.jl, since that gives you a small target that already undergoes much scrutiny.

Regarding the trust/authorisation mechanisms, I'll put together another document on that.  I was envisaging this as a series of three documents, one on how they interact with the repository, one on how the trust mechanism would work, and one on how the signatures would actually be performed.

Are there enough people around the place to bootstrap a Debian-like system?  If there were the slightest hint that such a thing were to occur, then I'd want to start racing around France in the next few months distributing public keys before I have to go back to Australia where it's a huge pain to do so, lest it become important later to be in the web of trust.

Thanks,
Lachlan

Mauro

unread,
Nov 5, 2015, 10:44:06 AM11/5/15
to juli...@googlegroups.com
I don't have much of a frame of reference here but: Arch Linux
implemented package signing relatively recently ~2010. As Arch tries to
follow KISS principles, this may also be a good source of inspiration?
A short web-search turned up this:
http://allanmcrae.com/2011/12/pacman-package-signing-4-arch-linux/
(although I have no idea whether this is any good).

Lachlan Gunn

unread,
Nov 5, 2015, 10:48:19 AM11/5/15
to juli...@googlegroups.com
A quick look suggests that it is quite similar to APT/Yum---optional package signatures, with the repository structure itself signed too, all using GPG.

Stefan Karpinski

unread,
Nov 5, 2015, 12:20:50 PM11/5/15
to juli...@googlegroups.com
I think the most helpful thing right now might be to try to list attack models and then we can figure out if or how we are going to try to prevent or mitigate them. It's one thing to prevent some one from trying to alter the code of a releases package version after the fact, it's a totally different thing to have some trust mechanism for deciding about the security of released code in the first place. But if you can't trust the released version does it really matter if someone can change it? Do we want to take an approach that tries to prevent attacks or do we want to make it so that it's highly auditable after the fact if an attack is discovered? In a lot of ways the latter is more useful (and prevents attacks for fear of discovery), but you kind of need to be able to trace things back to an actual person for it to have any efficacy. Of course, this is very much at odds with the semi-anonymous nature of open source dev – a lot of people don't even want to put their real picture on GitHub let alone reveal who they are IRL. But without that, you can't really prevent or uncover anonymous attacks.

I can imagine a scheme where people who are known quantities can release code and people can use it just by virtue of trusting who they are via a crypto signature; other less known people can release code too but trust would have to be established by an external audit of that code rather than by knowing who they are.

Lachlan Gunn

unread,
Nov 5, 2015, 1:55:30 PM11/5/15
to juli...@googlegroups.com

This was what I was hinting at with my security definitions. I don't think you can prevent anything without control over the merge process, unless you go completely to external tools for merging of pull requests. But it suffices to be auditable at the point of installation. Since there's interest I'll go through and build up some flow diagrams and fault trees and things to try to inform the next step.

Thanks,
Lachlan

Lachlan Gunn

unread,
Nov 7, 2015, 8:43:51 AM11/7/15
to julia-dev
Ok, I have updated the document to talk a bit more about attack scenarios:


From a cryptographic perspective, all the signatures tell you is that a piece of code has not been changed since it was created, and so I have left out questions of auditing and trust, and just touched on dataflow.  The conclusion is that if you want gatekeepers to METADATA.jl to have any cryptographically-enforceable control, they need to publish signatures somewhere, which can't be done with the pull-request interface unless they are willing to give standing permission before seeing the code.

This standing permission can be as granular as you like, but it has to be enforceable by whoever is verifying the signature.  This could be as simple as a list of directories that the author is trusted to modify, or as complicated as requiring two signatures from three designated reviewers plus verification by a static-analysis tool that it cannot read or write external files.  The latter is pushing the bounds of realism, but I want to stress that you can put quite a bit of flexibility into the certificate system if need be, though obviously one is bounded by the KISS principle.

Thanks,
Lachlan 

catc...@bromberger.com

unread,
Nov 7, 2015, 11:55:58 AM11/7/15
to julia-dev
I'm not sure the extra hassle of trying to mitigate attack scenario 1 (malicious author) is worth it. I don't know of other FOSS efforts to do this, probably because it's difficult to do even in a commercial offering (see Apple's difficulty in curating its app store). In FOSS, the principle has shifted to "caveat emptor", I think particularly because the main/only cost to the buyer in this model is acceptance of some risk that the code does what s/he wants (and doesn't do what s/he doesn't want).

Attack scenarios 2-4 are similarly difficult to detect without some sort of review by third party curators, which probably becomes untenable at scale.

Seth.



On Saturday, November 7, 2015 at 5:43:51 AM UTC-8, Lachlan Gunn wrote:
Ok, I have updated the document to talk a bit more about attack scenarios:


From a cryptographic perspective, all the signatures tell you is that a piece of code has not been changed since it was created, and so I have left out questions of auditing and trust, and just touched on dataflow.  The conclusion is that if you want gatekeepers to METADATA.jl to have any cryptographically-enforceable control, they need to publish signatures somewhere, which can't be done with the pull-request interface unless they are willing to give standing permission before seeing the code.

This standing permission can be as granular as you like, but it has to be enforceable by whoever is verifying the signature.  This could be as simple as a list of directories that the author is trusted to modify, or as complicated as requiring two signatures from three designated reviewers plus verification by a static-analysis tool that it cannot read or write external files.  The latter is pushing the bounds of realism, but I want to stress that you can put quite a bit of flexibility into the certificate system if need be, though obviously one is bounded by the KISS principle.

Thanks,
Lachlan 

On Thursday, 5 November 2015 18:20:50 UTC+1, Stefan Karpinski wrote:
I think the most helpful thing right now might be to try to list attack models and then we can figure out if or how we are going to try to prevent or mitigate them. It's one thing to prevent some one from trying to alter the code of a releases package version after the fact, it's a totally different thing to have some trust mechanism for deciding about the security of released code in the first place. But if you can't trust the released version does it really matter if someone can change it? Do we want to take an approach that tries to prevent attacks or do we want to make it so that it's highly auditable after the fact if an attack is discovered? In a lot of ways the latter is more useful (and prevents attacks for fear of discovery), but you kind of need to be able to trace things back to an actual person for it to have any efficacy. Of course, this is very much at odds with the semi-anonymous nature of open source dev – a lot of people don't even want to put their real picture on GitHub let alone reveal who they are IRL. But without that, you can't really prevent or uncover anonymous attacks.

I can imagine a scheme where people who are known quantities can release code and people can use it just by virtue of trusting who they are via a crypto signature; other less known people can release code too but trust would have to be established by an external audit of that code rather than by knowing who they are.

Lachlan Gunn

unread,
Nov 7, 2015, 12:56:48 PM11/7/15
to juli...@googlegroups.com
Scenario 2 (compromised repository) will ideally be caught if the author looks at any diffs on Github, at least for smallish changes, but I agree that this probably isn't practical to counter.

Scenario 3 (compromised channel between the author and Github) will be detected in some instances by certificate pinning, but in all cases by the author performing a signature before uploading to Github.

Scenario 4 (compromised author Github repository) will be always caught by an author signature.

Signing package code at the author's end at least gives you a guarantee that a compromise of Github doesn't give attackers free reign to install whatever they like, they're limited to replay attacks at worst, which would be nice to counter.

I agree with you about auditing, but at the same time, if core packages ever end up being updated via Pkg, it would be nice to have _some_ sort of protection.  I only mentioned scenario one because there was a previous mention of auditing and so figured it was worth mentioning that it was possible to enforce in this way.  Plus the way things are now there is still _some_ auditing; if I put in a pull request out of nowhere changing the repository URL and sha1 of PyCall, there would be some fairly robust objections I would hope.

Thanks,
Lachlan

Lachlan Gunn

unread,
Nov 7, 2015, 1:04:04 PM11/7/15
to juli...@googlegroups.com

Stefan Karpinski

unread,
Nov 7, 2015, 11:28:23 PM11/7/15
to juli...@googlegroups.com
While I don't think we can realistically guard against malicious authors, I think that we can make sure that author identity remains stable and verifiable over time. In open source, as much else, reputation is key: I trust a new package by the same author because I trust their previous work. That trust has value and it's a lot of trouble to go to earning that trust just to blow it on a malicious attack.

We could, for example have a rule that you need to approve installs of packages and/or updates by new authors, but ones from authors that already have the effective ability to compromise your system because they already wrote code in it are allowed.

David Anthoff

unread,
Nov 8, 2015, 1:00:00 AM11/8/15
to juli...@googlegroups.com

I think people would have to make trust decisions for VERY many people without the necessary information at hand to make a good decision. A typical user might have to make a trust decision for potentially hundreds of authors, if he/she uses a fair number of packages. But I don’t know any of them, so as a new user I really wouldn’t have the info to make these decisions.

 

I think if this “make trust decisions about authors” is really the goal, there would have to be some sort of network of trust: if some core users trust someone (Stefan etc.), then I’d be happy to automatically trust those authors as well.

 

But, quite frankly, the whole thing to me seems overkill, plus it seems to me that there are much more pressing issues in julia-land than this.

 

Cheers,

David

Stefan Karpinski

unread,
Nov 8, 2015, 3:09:14 AM11/8/15
to juli...@googlegroups.com
That's not really the point. The point is to make it costly in terms of reputation to publish malicious code.

Lachlan Gunn

unread,
Nov 8, 2015, 9:14:56 AM11/8/15
to julia-dev
Personally, my main worry is that Github or one of the METADATA committers gets compromised.  In this instance, trust in the authors is irrelevant, you just want them to be the same as the person who initially submitted.  Even if it's opt-in, it would be nice to have some guarantee that the big packages that 30%+ of users install haven't been messed with on the server.

I agree that it's not a huge issue now, but it's easier to deal with it now than to wait until there are 10k packages, 10M users, making METADATA into a serious target.

Thanks,
Lachlan

catc...@bromberger.com

unread,
Nov 8, 2015, 10:46:49 AM11/8/15
to julia-dev
But there are two scenarios here, neither of which have the intended effect.

1) Malicious author publishes malicious code: s/he's not worried about reputation; the only thing we can do here is to "ban" this person from having updates in whatever curated packaging system we have. S/he will always have the option of self-hosting it. We can do that anyway, today – the only difference is how we *detect* that the code is malicious.

2) Non-malicous author publishes malicious code (via attack by malicious individual): now his/her reputation is tarnished due to an attack against his/her code. This seems an awfully lot like victim-blaming.

catc...@bromberger.com

unread,
Nov 8, 2015, 10:51:33 AM11/8/15
to julia-dev
There are, as of today, 68967 python packages in the pip repository. They don't do any sort of checking outside of signature matching, as far as I'm aware.

My suggestion to those who want verified code: make sure your binaries are verifiable as well (see Thompson's "Reflections on Trusting Trust" if you want to see how futile an exercise this is), and then create your own repo of julia packages that you've personally decided to trust.

Lachlan Gunn

unread,
Nov 8, 2015, 11:28:13 AM11/8/15
to juli...@googlegroups.com
Yes, I think something like that is probably the direction to go, with some extra protection to protect against replay attacks and against modifying other people's packages (just like PIP and such presumably do with a login system) .

In any case, I think the discussion is getting a bit off track---you can use a signature system to enforce all kinds of procedural things, but we can do that already and no-one has felt the need to.

If you make the process complicated then people will stop submitting packages, so how about as a starting point we draw a line by saying that the system must in the initial phase be no more complex than the existing one, and in particular that:

    1) The author _MUST_ be able to perform all package submissions without any extra commands; some user input for passphrases, key generation, or UID selection may be acceptable.

    2) It _SHOULD_ be possible to use it like PIP and the current system and such without backing up private keys---this could be done by offering to store the encrypted private key somewhere else, assuming the author is willing to accept the risk, but _MUST_ be optional.

    3) Accepting a new revision to METADATA.jl _MUST_ be possible without any additional work on the part of the author or merger.  A registration process _MAY_ be necessary for new authors or packages.

    4) The author _MUST_ be able to opt out of the use of signatures during the initial phase; existing packages are unsigned anyway and so this is necessary even if we ignore the user experience.

    5) Users _MUST_ be able to perform some level of useful package verification without manually establishing trust.

This lets us remove Github as a point of failure and leaves the option open later to add additional functionality, but doesn't mandate it.  Getting worked up now about curated package collections and auditing just adds fuel to the fire when by dropping controversial bits at first we could achieve a meaningful increase in security without making life any more difficult for the users.

Thanks,
Lachlan

catc...@bromberger.com

unread,
Nov 8, 2015, 12:07:52 PM11/8/15
to julia-dev
   3) Accepting a new revision to METADATA.jl _MUST_ be possible without any additional work on the part of the author or merger.  A registration process _MAY_ be necessary for new authors or packages.

This is an interesting idea. We could use something like keybase.io to allow identity assertion. (Whether or not the underlying identity is itself trustworthy is probably outside the scope of this discussion).

Seth.

Lachlan Gunn

unread,
Nov 8, 2015, 12:27:39 PM11/8/15
to juli...@googlegroups.com
Oooh; thanks, I'd not seen keybase before, it's an interesting idea.

I wasn't even thinking of anything that elaborate, just a list (signed, of course) of public keys and the directories to which they are permitted to attest in METADATA.jl.  Those master keys would need to be supplied with Pkg so that it could be verified, and one of the holders would have to (possibly in a delegated manner) sign permission changes.  This is the registration requirement that I mentioned.  They could possibly further delegate permission control over a particular directory to a package owner, but that's starting to get complicated.

Before submitting the pull request, the author---or actually PkgDev---does something with a similar effect to "find . | grep -v "^package.sig$" | xargs sha512sum | gpg --clearsign -o package.sig".

After Pkg downloads the package, they look at the public key list, check permissions, make sure package.sig validates, and verify that the checksums are all correct.

Thanks,
Lachlan

Tony Kelman

unread,
Nov 8, 2015, 5:25:32 PM11/8/15
to julia-dev
A Unix-only solution isn't going to fly here. Pkg and PkgDev need to work cross-platform. Might be able to get gpg via WinRPM, but that should probably be opt in since it's an additional binary dependency.

Lachlan Gunn

unread,
Nov 8, 2015, 5:34:02 PM11/8/15
to juli...@googlegroups.com

Yep, of course.  Particularly with me being a Windows user I'm quite aware of that :)

I just trust myself to write that command line more clearly than I could the English equivalent.

It will be worth seeing what we can do with existing dependencies; Python went with X.509 rather than PGP to a great extent, as I understand, because they preferred to use OpenSSL over GnuPG. Though GNUTLS I think has the ability to do some PGP, from memory, so who knows at this stage.

Thanks,
Lachlan

Lachlan Gunn

unread,
Nov 9, 2015, 7:31:22 AM11/9/15
to julia-dev
For an initial prototype, DJB's TweetNaCl library might be good choice rather than going all out with GPG or OpenSSL.  Very lightweight too, with 17kB of C source code.  The licence is public domain, it's a cut-down version of NaCl by the same author indended to be small enough to audit and being far more portable, at the cost of performance due to the lack of optimised assembly code.  The full NaCl is used by Tor, and the author is reputable, so I don't think this is too outlandish as a first cut, and it is _very_ easy to use.  When I have a chance I'll wrap it and put something together

Thanks,
Lachlan

Stefan Karpinski

unread,
Nov 9, 2015, 7:52:25 AM11/9/15
to juli...@googlegroups.com
+1 to trying something small and simple by djb 

Lachlan Gunn

unread,
Nov 11, 2015, 3:55:57 PM11/11/15
to julia-dev
So I've had a go at this, it's not finished yet but since there was a public holiday over here and I've had some time to get properly started on this I thought I'd give a quick progress report.

The code is on Github here:


It's still a bit of a mess as things are continuously being refactored, but I've put examples in the README.md of how to use it to generate and verify some certificates.

It doesn't do any file-level verification, but I've wrapped the signature part of TweetNaCl and used it to implement a signature system with a single-level PKI.  When you initialise the system, it generates a master key that is used to sign all of the user certificates.  User certificates have a name and list of directories assigned; in reality you would want to have an expiry date as well at the very least.  Finally, you can sign arbitrary data, which goes into a big JSON with all of the data needed to verify it back to the top-level key.

The next step is to write the code to generate checksums for a package, sign it, and then after verifying the signatures to make sure that a) all files have been checked, and b) that all of the checked files are in directories that the user has permission to sign.

The whole "reinventing X.509" thing worries me a bit, hopefully we'll be able to use something that or PGP eventually.  But this approach is quick and lets us play with the model.

Thanks,
Lachlan

Lachlan Gunn

unread,
Nov 21, 2015, 6:11:31 PM11/21/15
to julia-dev
This is now up and running as a package verification tool.  You can call PkgVerifierPrototype.construct_package_certificate and PkgVerifierPrototype.verify_package_certificate, and it will automatically hash everything that isn't a directory called ".git"---obvious security hole, but prototype is the key word here---then sign it, then undo it and compare with the list of local hashes respectively.  It should verify that the user actually has permission to sign that package, but I haven't done that yet.  You know the drill, don't use it for anything important, assume all security is on the honour system, etc.

The model that I've used is a single-level CA that delegates authority to users over certain paths.  This is done by signing the user public key and a list of permissions with the CA key.  The user then attaches this certificate to all of their own certifications.  When you go to verify something, you use the CA public key to verify the user's certificate.  Since this is now trusted, you pull out the public key and use it to verify the signature over the list of hashes.

It will detect changed files, but also added and removed files as well.

All of this is just a big pile of JSON in order to help me get something up and running quickly---obviously something a bit more user-friendly is probably desirable in the final system.  Here is the link again:



It doesn't do anything clever like verify dependencies, that would require dealing with METADATA.jl and isn't worth it for the moment.

Thanks,
Lachlan

Stefan Karpinski

unread,
Nov 24, 2015, 4:59:40 PM11/24/15
to juli...@googlegroups.com
Thanks for coding this up. I'll take a look at this over the next few days. Would definitely be good if others want to take a look as well.

Lachlan Gunn

unread,
Nov 24, 2015, 5:47:10 PM11/24/15
to juli...@googlegroups.com
No worries.  At the end of the day it's only a simple model, and probably has too many rough edges to be made into something more final, but it should be relatively straightforward to mash it into whatever is needed for a prototype.  The most glaring being the non-human-readable format and that the private key can't be put onto a smartcard because it's using NaCl, but also that there is as yet no way to say "this package MUST always be signed" so that an attacker can't just not sign the package and put it through.

Once that is cleaned up, I'll have a go at making a little wrapper module for Pkg that can do automatic verification.  This will mean setting up a server (or just a git repository maybe) that can manage all of the signatures.  Once this is done, I think we would have the full workflow ready to test.

Joshua Ballanco

unread,
Nov 28, 2015, 8:00:05 AM11/28/15
to juli...@googlegroups.com
I’m not sure I’ll have the time to review this code (though I’ll definitely try to make it a high priority should any spare time magically appear), however I just wanted to cheer on from the sidelines and say that I support this effort 100%!

For those still skeptical of the need/usefulness of such an effort, the RubyGems compromise of ~3 years ago is worth reviewing: https://news.ycombinator.com/item?id=5139583 . While this was ultimately a mostly-harmless illustration of a potential exploit, had a malicious attacker discovered the compromise first, the potential for harm could’ve been severe.

In particular, the most troubling threat model is an attacker silently introducing a compromised version of popular library. In Ruby’s case, this would be something like ActiveSupport (a key component of Rails). A silent modification of this gem could’ve opened backdoors on a wide swath of servers compromising countless users sensitive information.

Granted, we should be so luck for Julia to become so popular, and for any Julia library to garner the popularity of Rails, but as Ruby’s case illustrates, this sort of protection is FAR harder to introduce after-the-fact (and, indeed, I believe most RubyGems are STILL not signed).

Lachlan Gunn

unread,
Dec 6, 2015, 4:14:58 PM12/6/15
to julia-dev
Hello all again,

A quick update now that I've had some time to work on this.  As well as adding code to actually check whether a user has control over a package, I have added code to send and receive certificates from a server.

As such, I have created a second project at which remains to be documented:

        https://github.com/LachlanGunn/PkgVerifierServerPrototype/

This implements the simplest of web services, after calling the serve(repo) function with the path to the repository with the keys:

    /push/<package> : POST data contains a certificate, responds with (quotes included) "SUCCESS" or "FAILURE" (if the certificate is invalid)
    /cert/<package> : Returns package certificate, or "FAILURE" (with quotes) if it is not present.

All data is just stored in memory.  An interesting point is that you can do all this without authentication---accept all matching certificates with the correct permissions and that are newer than the existing one, since they are signed by the user.

Next step I think is to do work out how to do key management.

Thanks,
Lachlan

Reply all
Reply to author
Forward
0 new messages