I would like to automate the signing of some of the release files we
upload to the release page, starting with the source tarballs. My
initial goal is to have a CI job that automatically creates, signs, and
uploads the source tarballs, whenever a new release is tagged. I would
also like the key used for signing to be a 'project' key and not
someone's personal key.
Once this is done, I would like to implement something similar for the
release binaries, so that testers could upload the binaries and have
them automatically signed. This will be more difficult than the source
tarballs, because the binaries are built by individual testers, so we
would need to prove that they come from a trust-worthy source.
Implementing these changes, will help streamline the release process and
let release managers avoid doing a lot of manual mistake-prone tasks.
The questions I have for the community are:
Is this a good idea?
How can I implement this securely?
Thanks,
Tom
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I'm not sure, this is one thing I would like advice about. If we used
GitHub actions to do the signing, then using secrets would be one
option. I think we could also host our own GitHub Actions runner and
store the keys there.
-Tom
> Warm regards,
> Deep
> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Not sure exactly what automatically signing means. Here is my recent upload command.
scp -i ~/.ssh/id_rsa_llvm.pub clang+llvm-11.0.1-x86_64-linux-gnu-ubuntu-20.10.tar.xz tes...@releases-origin.llvm.org:/home/testers
My public key on the LLVM side, id_rsa_llvm.pub, identifies the upload as coming from me. It may be better to change the name of that public key to something like
id_rsa_nnelson.pub
Or possibly some identifier instead of nnelson assigned by LLVM.
The public key on the scp command uniquely identifies the source of the upload. The public key was previously uploaded to LLVM. User authentication occurs when the user side uses the private key to set up the encryption channel for the file transfer, with the LLVM side using the public key for that user's private key.
The determination of user trustworthiness is tied
to the user's public key and is by some method external to the
use of the keys. I expect that would be determined by the
quality of uploads in the past and perhaps to the degree that
others at LLVM can vouch for that user. This has the feel of a
MySQL database showing the user's name, public key name, upload
activity and community evaluations toward some degree of
trustworthiness. It may be that on the release page has user
upvotes and downvotes for each release file could be applied to
help in that rating.
At this point we need an SSH log on the LLVM side we can parse that will show what keys were used with what files uploaded. This parse would be done at some convenient frequency and may automatically update the MySQL DB and provide activity reporting. Moving of the uploaded files and updating the release page could possibly be done automatically. This last part depends on setting up the detail and format for that process.
Getting the SSH log working properly seems the stretch part at the moment but appears the obvious direction.
Neil Nelson
Just realized that the uploaded file's hash value needs to be handled easily and tied to subsequent uses of the file. It is the hash value that ties the primary file back to the user upload.
To automate the use of the hash value we might upload that value in a file using scp with the same public key as the primary upload file with a name tied to the primary upload file. Putting that hash value in the MySQL DB with the primary file name will be useful.
Neil Nelson
Just realized that the uploaded file's hash value needs to be handled easily and tied to subsequent uses of the file. It is the hash value that ties the primary file back to the user upload.
To automate the use of the hash value we might upload that value in a file using scp with the same public key as the primary upload file with a name tied to the primary upload file. Putting that hash value in the MySQL DB with the primary file name will be useful.
Hi Tom,I would strongly advise against just storing the keys in some files on some file system, much less if that machine has internet access. They are just too easy to copy and you won't even notice they leaked. From a pure security perspective, I guess in general there are two basic ways to securely manage the signing keys:1. Centralized: Set up (or pay for) a central server that keeps the keys and allows selected users to upload and sign files/hashes via UI or some command line tool.Pro:* one central place to manage the keys* central auditing options, you can log who signed which filesCon:* We need to run such a service securely or pay someone to do so. So we need to trust the admin of such an infrastructure.* Central place to get hacked.2. Decentralized: Use a "LLVM certificate" to sign multiple packager certificates. Then the packager can use their certificates to sign the files they just created on behalf of LLVM.Pro:* no server infrastructure needed, just openssl and a bunch of scripts* you can track releases back to users via the signatures* Keys can be stored on simple USB hardware security modules (e.g. Yubikey) to avoid leaking. You can have multiple of these for redundancy.Con:* Users need to take care of the keys themselves.* These signatures are harder to verify, as you need to verify the certificate chain and also check for revoked/expired certificates. Yet this can also be scripted.* Impossible to audit: We never know what files the users actually signed with their keys.* In case of distributing HSMs: We need to make sure that people can actually buy these in their country as there might be export regulations in place.I'm not so familiar with the release process, so I can't say which option is better.Best,Christian
I have mixed feelings about automatic signing. Since releases are done
by humans anyway, I don't see much gain by avoiding humans in the
signing process. Of course, there are some factors to consider.
First of all, it is quite clear that LLVM currently relies on GitHub
for the repo. No offense meant but I honestly doubt that anyone
actively monitors whether the repo has not been tampered with -- how
would you go about that anyway? So I suppose it matters little whether
you sign release tarballs automatically or manually because whatever you
sign might be forged at this point already.
Not saying this couldn't be improved. For example, in Gentoo we sign
all commits (but I don't think GH supports rejecting unsigned commits).
Then you could at least verify that every commit corresponds to the repo
state signed by someone you somewhat trust enough to give push access.
Secondly, I would presume any key stored outside your own trusted Infra
as compromised. Now, if we were talking about signing the artifacts of
automated builds (e.g. nightlies), then I'd suggest using a separate key
for automated signatures (that's kept on the server) and manual
signatures (that's kept on you). But if we're talking about final semi-
manual releases, then I think you should keep the key on yourself.
If you're planning to use PGP, please remember not to upload the secret
portion of the primary key to the server and keep that on you only.
This will make it possible to easily replace the signing subkey
if the server is compromised.
Thirdly, this also requires a trusted release process. You need to
verify that the binaries weren't tampered with before signing them,
so I guess they need to be signed too. It might be just simpler to let
release testers upload binaries along with their signatures, and just
provide an explicit list of who signed what rather than a single signing
key.
--
Best regards,
Michał Górny
It would be good to articulate what yo believe the benefit is here.
Signing is generally a process of associating an identity with an
artefact so that attestations can be made about that artefact.
Git hashes are intrinsically signatures. They associate the artefact of
the latest commit with the identity of the merkel tree that defines the
history. As a result, they allow a user to validate that the code that
they have is the same as some other repository. Someone can look at
their local depth-1 checkout and validate that it is part of the history
of the public repository.
The simplest way for a user to get a cryptographic attestation that they
have files that correspond to a revision in our git tree is to get a
depth 1 checkout of the repo. This is currently 140MB of data to
transfer with git. The extracted tree is 747MB including 149MB of git
metadata. The xz-compressed tarball sizes are very different: 86MB
without the git info, 230MB with, so there's a big size saving to be had
by not including the git data.
Presumably the goal here is to tie the hash of the tarball to a specific
git revision with less overhead than including the full git state in the
tarball. The core idea here is to allow folks that download the tarball
to delegate verifying that it matches the git repo to some other entity.
The simplest way for a user to do this is to grab the tarball over HTTPS
directly from GitHub using a URL like:
https://api.github.com/repos/llvm/llvm-project/tarball/0f59d099571d3d803b54e2ce06aa94babb9b26db
This gives a 125MB tarball, so slightly smaller than a depth-1 git
checkout (same git commit). GitHub provides a live attestation that
this tarball corresponds to the specific revision. You can verify
GitHub's TLS certificate to check the identity of the entity providing
the attestation and so if you trust GitHub not to lie to you about
something that's trivial to verify by doing a git clone then you have
the guarantee.
I assume; however, that your use case assumes *offline* verification.
This gets more tricky because any offline verification of signatures
also requires a revocation mechanism and policy. If our signing key is
compromised and someone signs a load of tarballs of LLVM + malware as
corresponding to a specific git revision that is publicly auditable and
doesn't include malware then how do we revoke those signatures?
This gets even more complicated once we start talking about binaries.
Signatures of binaries are typically used to assert that a specific
entity performed the build. Binary signatures of open source projects
typically attempt to associate the identity of the binary with the
identity of the specific source code from which the build was run.
For the former to be useful, there needs to be some notion of identity
for the folks doing the build. If your plan is for a individual
community members to be able to upload builds and have them signed then
what is the process going to be to authorise people to upload builds?
There's a big difference between builds that we can produce from CI VMs
that are initialised to a well-defined state before the build and builds
run on a random developer's machine that may be compromised.
For the signature to be useful for associating the build with a source
revision, it needs to be verifiable, which means that the build needs to
be reproducible. I believe LLVM does now support a reproducible build
configuration, do all of the release snapshots build it? If someone
runs a reproducible build and gets different output to the published
sources, what is the revocation policy?
Finally, what is the process for verifying the integrity of the binaries
on the client? Normally this is something that's tightly coupled with
the package management infrastructure. Windows MSIs, Linux RPMs and
Debs, FreeBSD pkgs all use different kinds of signature (including
different signing algorithms, different signature serialisation formats,
and even different scopes of what is signed). Tarballs have no
intrinsic signature mechanism and so would need to be checked by hand.
Operationally:
- If a user finds a signature mismatch, what does it tell them?
- If we discover a malicious binary and need to revoke its signature,
how do we do that?
Without a lot more detail, I am opposed to adding generic signing
infrastructure. It adds complexity and the perception of security. We
need to *very* clearly establish the threat model and security
guarantees that we think we are providing before we can discuss whether
any given signature workflow actually achieves these guarantees.
David
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
The public/private keys identify the unique source for whatever is uploaded. The reason I suggested it be uploaded with a name attached to the primary upload was that a program can then automatically do the confirmation that the primary file is in good shape using the hash value.
If the primary upload was bogus, doing the hash
on the bogus file and sending that hash value by email will
enable the receiver of the email to confirm that hash value on
the bogus primary upload. Nothing is gained by not sending the
hash value by the encryption keys.
Neil Nelson
Thanks for this reply, this is really helpful.
To me the benefit of some kind of automatic signing would just be to
save time and also avoid mistakes in the current manual process.
It sounds like from this response and others that using GitHub actions
or some other CI systems to do the signing may not be the best approach.
I may just try to script more of what I'm currently during on my local
machine as that will help provide some of the benefits I'm looking for.
> Signing is generally a process of associating an identity with an
> artefact so that attestations can be made about that artefact.
>
> Git hashes are intrinsically signatures. They associate the artefact of
> the latest commit with the identity of the merkel tree that defines the
> history. As a result, they allow a user to validate that the code that
> they have is the same as some other repository. Someone can look at
> their local depth-1 checkout and validate that it is part of the history
> of the public repository.
>
> The simplest way for a user to get a cryptographic attestation that they
> have files that correspond to a revision in our git tree is to get a
> depth 1 checkout of the repo. This is currently 140MB of data to
> transfer with git. The extracted tree is 747MB including 149MB of git
> metadata. The xz-compressed tarball sizes are very different: 86MB
> without the git info, 230MB with, so there's a big size saving to be had
> by not including the git data.
>
> Presumably the goal here is to tie the hash of the tarball to a specific
> git revision with less overhead than including the full git state in the
> tarball. The core idea here is to allow folks that download the tarball
> to delegate verifying that it matches the git repo to some other entity.
>
> The simplest way for a user to do this is to grab the tarball over HTTPS
> directly from GitHub using a URL like:
>
> https://api.github.com/repos/llvm/llvm-project/tarball/0f59d099571d3d803b54e2ce06aa94babb9b26db
>
GitHub automatically adds a tarball to the release page for us:
https://github.com/llvm/llvm-project/archive/llvmorg-11.0.1.tar.gz
But, we stopped relying on this, because we had a user report that the
tarball format was not stable, so you weren't guaranteed to get the
exact same bits each time you download it. I'm not sure if this same
issue affects the tarballs accessed by using a git commit hash as well.
-Tom
It does not, for anything identified by a hash or a stable tag (you can
break them modifying updating a tag, maybe LLVM moved the tags for
releases at some point?).
The FreeBSD ports infrastructure uses these for builds and stores a
SHA256 hash and size of the tarball that was available when the port was
created (and the timestamp when these were checked) in the distinfo file
that is used by the fetch step of the build. We would be unable to
build any of the packages that used GitHub sources if they changed
because the build system would detect that as tampering. A quick grep
over a somewhat old checkout of the ports tree tells me that there are
around 800 packages built from these tarballs. FreeBSD recommends that
port maintainers use the hash ID if there are any doubts that the
upstream project will keep the tag pointed at the same commit.
David