Now that the bcrypt functionality[1] has been deployed, I'd like to move forward with a further proposal for a revamped Clojars. I'd be happy to get your opinions on the following.
Clojars should be a frontend for deploying jars directly to a repository hosted on S3 for improved redundancy and reliability.
## Separate Repositories
It's desirable to have as few snapshots repositories in your list as possible in order to speed up dependency resolution. In addition, the presence of snapshots introduces undesirable semantics for version ranges. Because of this, the final release of Leiningen 2 will not have any snapshot repositories in its default list. This means Clojars will need to serve a separate repository to segregate releases from snapshots.
Rather than the Maven world's traditional snapshots/releases split, the original repository will be left alone as an "anything goes" repo while the new next-generation repository would only accept releases.
## Maintaining Backwards Compatibility
Leiningen 1 will only check the old repository, but a redirect rule can be added to it so that artifacts that are not found can fall through to the new repository. Some restrictions may be required here; for instance artifacts that have had pushes to the new repository should not accept uploads to the old anymore.
In addition, compatibility needs to go the other way. At the time the new repository is launched, it should be seeded with all non-snapshot artifacts from the old that are grandfathered in. The two repositories should share a single user database and group permissions list.
## Requirements
The new repository will not accept snapshots. We may also want to impose additional requirements on uploads. One possibility would be to require signing of jars. We need to do some investigation around how this can be done with the minimum hassle as well as discussing further policies that can be built around it.
Another possibility is to require more elements of the pom to be filled out. In particular, fields of interest include description, URL, licenses, and SCM availability. Having a richer set of metadata available could prove valuable down the line.
## Uploads
Uploads to the new repository should be done over HTTP using the standard Maven deployment mechanism. This will allow greater tooling interoperability as well as reusing existing code and shelving the custom SCP uploader. Jars once uploaded should be stored on S3 so that downtime of the machines hosting the Clojars application will not affect availability of jars. It will also make setting up a Clojars instance for development much simpler since everything can be done in-process.
## Migration
The first step is to migrate the database to PostgreSQL. Then both the old and new repositories can share a database. The once the HTTP upload functionality is finished, the new repository should be seeded with releases from the old, and it can start accepting new uploads.
Rather than the Maven world's traditional snapshots/releases split,
> the original repository will be left alone as an "anything goes" repo > while the new next-generation repository would only accept releases.
I like this plan and would suggest that in the new repo, jars with duplicate version numbers should not be permitted. For example, this directory shows several jars with the same version number: http://clojars.org/repo/aleph/core/0.6.0-SNAPSHOT/ I also experimented and noticed that if I use "lein push" multiple times with the same (release) version number then I overwrite a jar already on clojars. I think the new Clojars should instead have a policy of immutable dependencies, so that users of these libraries can be sure the bytecode won't change. Probably requiring monotonicity in version numbers is also preferable.
> The new repository will not accept snapshots. We may also want to > impose additional requirements on uploads. One possibility would be to > require signing of jars.
I particularly like this idea (or at least making signing an easy feature), because use of third party dependencies is based in part on the reputation of the authors.
Another possibility is to require more elements of the pom to be
> filled out. In particular, fields of interest include description, > URL, licenses, and SCM availability. Having a richer set of metadata > available could prove valuable down the line.
What about linking the jars (in an automated way) to github? If lein/clojars could easily match artifacts to a github commit, that would perhaps serve as a source of rich metadata.
I was also quite intrigued by the suggestion in Clojurescript One that was forked to this plugin: https://github.com/tobyhede/lein-git-deps The idea that git urls could be a useful way to manage source dependencies. I'm naively wondering if it might be good for the new Clojars to index dependency releases hosted on github.
"Arthur D. Edelstein" <arthuredelst...@gmail.com> writes:
> I like this plan and would suggest that in the new repo, jars with > duplicate version numbers should not be permitted. For example, this > directory shows several jars with the same version number: > http://clojars.org/repo/aleph/core/0.6.0-SNAPSHOT/
This should be taken care of by disallowing snapshots, actually.
> I also experimented and noticed that if I use "lein push" multiple > times with the same (release) version number then I overwrite a jar > already on clojars. I think the new Clojars should instead have a > policy of immutable dependencies, so that users of these libraries > can be sure the bytecode won't change. Probably requiring > monotonicity in version numbers is also preferable.
Good idea; this should be explicitly enforced.
> What about linking the jars (in an automated way) to github? If lein/ > clojars could easily match artifacts to a github commit, that would > perhaps serve as a source of rich metadata.
The poms generated by Leiningen will automatically contain links to the git repository if it was generated from a project that's a git checkout. But perhaps we should go the extra measure and require this.
> I was also quite intrigued by the suggestion in Clojurescript One > that was forked to this plugin: > https://github.com/tobyhede/lein-git-deps > The idea that git urls could be a useful way to manage source > dependencies. I'm naively wondering if it might be good for the new > Clojars to index dependency releases hosted on github.
Keeping the scm metadata around is a fine idea. The idea of generating a repo full of jars based on tags found in registered project git repos is one that's occurred to me, (indeed, this is what http://melpa.milkbox.net/ does) but I think that would be a different kind of repository from Clojars.
One of the features that the team i'm working with really like is uploading new artifacts with scp :-); its so simple just scp x.jar pom.xml clojars.org:.
I agree that support for the standard way of uploading mvn would be great.
> Uploads to the new repository should be done over HTTP using the > standard Maven deployment mechanism. This will allow greater tooling > interoperability as well as reusing existing code and shelving the > custom SCP uploader. Jars once uploaded should be stored on S3 so that > downtime of the machines hosting the Clojars application will not > affect availability of jars. It will also make setting up a Clojars > instance for development much simpler since everything can be done > in-process.
On Fri, Mar 9, 2012 at 10:09 PM, Phil Hagelberg <p...@hagelb.org> wrote: > Hello folks.
> Now that the bcrypt functionality[1] has been deployed, I'd like to > move forward with a further proposal for a revamped Clojars. I'd be > happy to get your opinions on the following.
> Clojars should be a frontend for deploying jars directly to a > repository hosted on S3 for improved redundancy and reliability.
> ## Separate Repositories
> It's desirable to have as few snapshots repositories in your list as > possible in order to speed up dependency resolution. In addition, the > presence of snapshots introduces undesirable semantics for version > ranges. Because of this, the final release of Leiningen 2 will not > have any snapshot repositories in its default list. This means Clojars > will need to serve a separate repository to segregate releases from > snapshots.
> Rather than the Maven world's traditional snapshots/releases split, > the original repository will be left alone as an "anything goes" repo > while the new next-generation repository would only accept releases.
> ## Maintaining Backwards Compatibility
> Leiningen 1 will only check the old repository, but a redirect rule > can be added to it so that artifacts that are not found can fall > through to the new repository. Some restrictions may be required here; > for instance artifacts that have had pushes to the new repository > should not accept uploads to the old anymore.
> In addition, compatibility needs to go the other way. At the time the > new repository is launched, it should be seeded with all non-snapshot > artifacts from the old that are grandfathered in. The two repositories > should share a single user database and group permissions list.
Do separate repos get more then just shorter defaults? Does
> ## Requirements ... > Another possibility is to require more elements of the pom to be > filled out. In particular, fields of interest include description, > URL, licenses, and SCM availability. Having a richer set of metadata > available could prove valuable down the line.
I'm a fan of using more info from the pom. Maybe not of requiring it. Looking at a page like http://rubygems.org/gems/rails and being able to see all the places to find info is nice.
> ## Uploads
> Uploads to the new repository should be done over HTTP using the > standard Maven deployment mechanism. This will allow greater tooling > interoperability as well as reusing existing code and shelving the > custom SCP uploader. Jars once uploaded should be stored on S3 so that > downtime of the machines hosting the Clojars application will not > affect availability of jars. It will also make setting up a Clojars > instance for development much simpler since everything can be done > in-process.
+1 for http uploads and s3. The http uploads will make it much easier to do testing then faking scp.
> ## Migration
> The first step is to migrate the database to PostgreSQL. Then both the > old and new repositories can share a database. The once the HTTP > upload functionality is finished, the new repository should be seeded > with releases from the old, and it can start accepting new uploads.
What about search? Currently it is some sqlite queries. Are there plans to continue that in postgres for awhile? Any thoughts on lucene or since it will be on heroku using https://addons.heroku.com/searchify ?
I think it would have to live at a different URL. It could be the same domain though.
> I'm a fan of using more info from the pom. Maybe not of requiring it. > Looking at a page like http://rubygems.org/gems/rails and being able > to see all the places to find info is nice.
I think requiring a license and URL is pretty reasonable. Description helps a lot for search results, but I could budge on it. On the other hand it doesn't take long to fill out. We should also check for the FIXME defaults that Leiningen places in new project skeletons and reject those.
> What about search? Currently it is some sqlite queries. Are there > plans to continue that in postgres for awhile? Any thoughts on lucene > or since it will be on heroku using https://addons.heroku.com/searchify ?
I was originally thinking Lucene since we've already got the code for searching that in lein search, but I think it's up to whomever implements it. Postgres full-text search would have the advantage of living in the database, and thus it wouldn't have to be rebuilt when a new node is spun up. Fewer moving parts is always nice.
> > What about search? Currently it is some sqlite queries. Are there > > plans to continue that in postgres for awhile? Any thoughts on lucene > > or since it will be on heroku using https://addons.heroku.com/searchify?
> I was originally thinking Lucene since we've already got the code for > searching that in lein search, but I think it's up to whomever > implements it. Postgres full-text search would have the advantage of > living in the database, and thus it wouldn't have to be rebuilt when a > new node is spun up. Fewer moving parts is always nice.
> Since search on clojars is being discussed, I should mention that I am
currently developing a website that lets one search for clojure vars. Clojure source (.clj) files are extracted from jar files, parsed for vars (functions, macros, etc.) and then indexed by the solr/lucene search engine. I haven't worked on a proper front end yet, but you can get the idea:
Currently the search database includes the latest release of every artifact in Clojars. I'm hoping in the future to extend it to maven central and possibly github as well. I can see that project descriptions is something I should add to the search.
Any suggestions or comments would be welcome at this early stage. Thanks!
After some discussion on IRC I've come up with a slightly different
proposal for a next-gen repository. It involves releases being
deployed to the existing repository and promoted to the releases repo
on S3 only when they pass the necessary qualifications.
* Library author invokes `lein deploy clojars` on a non-snapshot version.
* Leiningen invokes gpg to sign both the jar and the pom.
* Leiningen transmits the jar, pom, and two signatures over HTTP.
* Clojars accepts each file.
* After each file is transmitted, the set of artifacts as a whole is
evaluated to see if it qualifies for promotion to the releases
repository. (signatures match, pom metadata is found)
* If so, they're all uploaded to S3.
* If not, a list of reasons why it doesn't qualify is available to the
owner through the web UI.
Part of the reason this is preferable to deploying directly to the
releases repository is that Aether has no notion of transactionality,
so it must accept each jar and pom file before it knows whether they
are really qualified to be deployed. We could fake transactionality on
top of that, but if we're going to need a state machine then we might
as well use the state of the existing repository.
I think this will turn out to be much simpler than the original
proposal since it still only involves a single Clojars instance. We
wouldn't even need to port it off SQLite, but we would still get the
benefits of redundancy from S3.
Would like to hear your thoughts on this approach.
On Wed, Jun 13, 2012 at 10:49 PM, Phil Hagelberg <p...@hagelb.org> wrote:
> After some discussion on IRC I've come up with a slightly different
> proposal for a next-gen repository. It involves releases being
> deployed to the existing repository and promoted to the releases repo
> on S3 only when they pass the necessary qualifications.
> * Library author invokes `lein deploy clojars` on a non-snapshot version.
> * Leiningen invokes gpg to sign both the jar and the pom.
> * Leiningen transmits the jar, pom, and two signatures over HTTP.
> * Clojars accepts each file.
> * After each file is transmitted, the set of artifacts as a whole is
> evaluated to see if it qualifies for promotion to the releases
> repository. (signatures match, pom metadata is found)
> * If so, they're all uploaded to S3.
> * If not, a list of reasons why it doesn't qualify is available to the
> owner through the web UI.
> Part of the reason this is preferable to deploying directly to the
> releases repository is that Aether has no notion of transactionality,
> so it must accept each jar and pom file before it knows whether they
> are really qualified to be deployed. We could fake transactionality on
> top of that, but if we're going to need a state machine then we might
> as well use the state of the existing repository.
> I think this will turn out to be much simpler than the original
> proposal since it still only involves a single Clojars instance. We
> wouldn't even need to port it off SQLite, but we would still get the
> benefits of redundancy from S3.
> Would like to hear your thoughts on this approach.
> thanks,
> Phil
Couldn't you achieve the same effect by using a staging area and
verification process, before uploading to S3?
Is the assumption that every deployment to the existing repository would
automatically get promoted, or would there be some kind of lever that
people could pull? I didn't catch that from your description.
If the assumption is that everything is going to be auto promoted, then you
could use a staging area to fake transactionality without going through an
extra repo.
On Thu, Jun 14, 2012 at 12:10 PM, Paul Stadig <p...@stadig.name> wrote:
> Couldn't you achieve the same effect by using a staging area and
> verification process, before uploading to S3?
That would probably achieve the same effect, but inventing a new place
to store state makes things more complicated, especially for the
future when we have to scale beyond one node. In this proposal the
existing repository *is* the staging area.
> Is the assumption that every deployment to the existing repository would
> automatically get promoted, or would there be some kind of lever that people
> could pull? I didn't catch that from your description.
Every deployment would automatically be considered for promotion, but
to be promoted they will have to qualify first by 0) not being a
snapshot, 1) declaring all required metadata, and 2) including
signatures that verify for both the pom and the jar.
On Jun 14, 12:49 pm, Phil Hagelberg <p...@hagelb.org> wrote:
> After some discussion on IRC I've come up with a slightly different
> proposal for a next-gen repository. It involves releases being
> deployed to the existing repository and promoted to the releases repo
> on S3 only when they pass the necessary qualifications.
> * Library author invokes `lein deploy clojars` on a non-snapshot version.
> * Leiningen invokes gpg to sign both the jar and the pom.
I'm concerned that if we start enforcing it, we ensure the jar signing
procedure will be straightforward for people who do not use gpg
frequently (and I would expect a large proportion of the community are
in that category). As far as I can tell enforcing jar-signing
currently introduces little benefit as tools currently do not check
the signatures.
Clojars' only reason for existence is to address usability problems
with Maven Central and lower the barrier to entry. If we reach a point
(either through us changing or Central changing) where pushing to
Clojars is equally onerous as pushing to Central, we should stop
accepting uploads and instead become a discovery tool for finding
Clojure libraries in Central. Otherwise we're just needlessly
fragmenting.
I would like to know what ideas people have for how jar-signing is
actually to be enforced and used. There seems to me very little point
in signatures unless you actually check the signer against a list.
That's relatively easy in a conventional Linux distro setting with a
coordinated release team, not so easy in our case unless I'm missing
something.
I do think signing is a good idea, I just think we should have a
better idea of where we're going with it in order to help make
decisions like this on how to transition. This may well have been
covered in IRC conversations that I missed.
> * Leiningen transmits the jar, pom, and two signatures over HTTP.
> * Clojars accepts each file.
> * After each file is transmitted, the set of artifacts as a whole is
> evaluated to see if it qualifies for promotion to the releases
> repository. (signatures match, pom metadata is found)
> * If so, they're all uploaded to S3.
> * If not, a list of reasons why it doesn't qualify is available to the
> owner through the web UI.
> Part of the reason this is preferable to deploying directly to the
> releases repository is that Aether has no notion of transactionality,
> so it must accept each jar and pom file before it knows whether they
> are really qualified to be deployed. We could fake transactionality on
> top of that, but if we're going to need a state machine then we might
> as well use the state of the existing repository.
Right. The trouble with the HTTP upload mechanism is it's difficult to
validate and difficult to give feedback.
I do like the idea of being able to view the history and status of
your recent pushes through the web UI. Although if possible Leiningen
should print it immediately as well. Perhaps printing a link to the
status page would be enough.
The main (and only?) benefit I see to staging to the existing
repository rather than to an inaccessible temporary directory is to
allow for an incremental migration. If one of your dependencies has
issues blocking it from entering the more stringent repo, you can at
least upload and you will be automatically promoted once upstream is
fixed.
This does have the implication that releases are not immutable, at
least in the staging repository. This is ok by me, it's been quietly
the current policy and generally been more useful (fixing mistakes in
metadata) than it has been painful. There has been some pain though.
People often don't understand caching and are surprised when their
change doesn't make it to clients.
> I think this will turn out to be much simpler than the original
> proposal since it still only involves a single Clojars instance. We
> wouldn't even need to port it off SQLite, but we would still get the
> benefits of redundancy from S3.
I am very much in favour of smaller incremental changes than trying to
tie multiple changes together.
On Jun 15, 6:19 am, Phil Hagelberg <p...@hagelb.org> wrote:
> On Thu, Jun 14, 2012 at 12:10 PM, Paul Stadig <p...@stadig.name> wrote:
> > Couldn't you achieve the same effect by using a staging area and
> > verification process, before uploading to S3?
> That would probably achieve the same effect, but inventing a new place
> to store state makes things more complicated,
Upsides (of using the existing repo as staging):
* it's visible
* it doesn't break any existing workflows.
* it allows for incremental migration
Downsides:
* it exposes incorrect jars/metadata
* it means two sets of data need to be maintained over time
> especially for the future when we have to scale beyond one node.
The only scenario I can think of where we'd need multiple upload nodes
is if we're doing some insane level of processing. Perhaps metadata
extraction? Super fancy doc generation? But if we're blocking the
upload process with something that expensive I think we're doing it
wrong anyway.
> In this proposal the
> existing repository *is* the staging area.
> > Is the assumption that every deployment to the existing repository would
> > automatically get promoted, or would there be some kind of lever that people
> > could pull? I didn't catch that from your description.
> Every deployment would automatically be considered for promotion, but
> to be promoted they will have to qualify first by 0) not being a
> snapshot, 1) declaring all required metadata, and 2) including
> signatures that verify for both the pom and the jar.
I think it's important this happens automatically and without any
unnecessary delays (no "nightly" or "weekly" batch silliness). My
understanding of Phil's proposal that it's triggered automatically
when you (or one of your dependencies?) is pushed. Plus a once-off run
against all the existing content when the change is introduced.
You raise some good points as always. Some more context is definitely needed.
On Sun, Jun 17, 2012 at 1:38 AM, Alex Osborne <a...@meshy.org> wrote:
> I'm concerned that if we start enforcing it, we ensure the jar signing
> procedure will be straightforward for people who do not use gpg
> frequently (and I would expect a large proportion of the community are
> in that category). As far as I can tell enforcing jar-signing
> currently introduces little benefit as tools currently do not check
> the signatures.
Indeed, I've been thinking of signatures as the foundation on which
further security measures can be laid. It's necessary, but certainly
not sufficient on its own. That said, we currently have rudimentary
signature checks implemented on Leiningen master:
http://p.hagelb.org/deps-verify.html This only gives us a binary y/n
as far as whether the jars are signed by someone with a key on MIT's
public key server, but it can be expanded to determine whether there's
a valid trust path from your trusted keys to the signer of the
artifact.
What will be checked by default is still an open question, but without
having signatures it's impossible to build a system which can be
trusted. I don't believe simply supporting signatures is enough;
rubygems supports signatures but nobody I've talked to is even aware
of this fact since signing is not required. I expect a typical
deployment would look something like `lein do deps :verify, test,
uberjar`. A bit of background reading:
http://branchandbound.net/blog/security/2012/03/crossbuild-injection-... (followed of course by http://p.hagelb.org/scared.gif)
Of course, care should be taken for the user experience. Right now on
the deploy-asc branch of Leiningen, each repository has a
:sign-releases option. If it's set, then a signature for the jar and
pom will be generated and deployed. If you have a key in place and
your agent is set up, there is nothing more to do; it simply works. So
I believe the only two issues are A) user education; ensuring people
learn how to generate and protect their keys. But this is something
all professional software developers _should_ already know how to do.
Admittedly crypto literacy is not high among developers, but it needs
to be regardless of Leiningen--it's not OK for professionals to remain
ignorant of it. Perhaps practical objections ("people won't bother")
can be raised here, but I don't buy the "it's too hard" argument.
B) is stickier; apparently gpg-agent is not well-integrated with
certain operating systems. On Gnome it will prompt you for your
passphrase if the agent doesn't have it cached, but apparently Macs
and Windows don't work that way out of the box. It looks like this can
be addressed with http://www.funtoo.org/wiki/Keychain, but I haven't
looked into it and the need for further third-party software to
address OS shortcomings is unfortunate. Perhaps deploying is uncommon
enough that entering the passphrase every time is OK.
> Clojars' only reason for existence is to address usability problems
> with Maven Central and lower the barrier to entry. If we reach a point
> (either through us changing or Central changing) where pushing to
> Clojars is equally onerous as pushing to Central, we should stop
> accepting uploads and instead become a discovery tool for finding
> Clojure libraries in Central. Otherwise we're just needlessly
> fragmenting.
Indeed, we need to take care. Clojars is still much more open in that
releases don't require human intervention to push out and there are no
requirements for verbose domain-centric group IDs. And the existing
repo isn't going anywhere either, other than out of the defaults list.
> I would like to know what ideas people have for how jar-signing is
> actually to be enforced and used. There seems to me very little point
> in signatures unless you actually check the signer against a list.
> That's relatively easy in a conventional Linux distro setting with a
> coordinated release team, not so easy in our case unless I'm missing
> something.
Ideally I would like each project to be able to specify who they
trust. There would probably be some keys trusted out of the box (you
have to trust Leiningen itself, for instance) and the keys trusted by
the authors of Leiningen itself seem like a good bet. But then
projects should be able to specify how many steps away from a trusted
source are still considered trusted (7 degrees of Kevin Bacon, etc)
and add and revoke additional keys. Another setting would be to
require more overlapping signatures for a key before it's considered
trusted. I'd imagine corporate projects should want stricter control
than open source community ones.
So far everything I've read about GPG has been very user-centric
rather than project-centric, but I think this should be doable. This
is on the consuming side.
On the publishing side, one way would be to only accept deploys signed
by the original author or a key directly trusted by the author. Key
rotation makes this interesting since the same author doesn't have the
same key forever; they will definitely need to delegate this
publishing trust in a way that we can distinguish from regular trust.
We definitely need more discussion around these policies; nothing's
been decided. We're still laying the foundation.
> I do like the idea of being able to view the history and status of
> your recent pushes through the web UI. Although if possible Leiningen
> should print it immediately as well. Perhaps printing a link to the
> status page would be enough.
We'll need to look into this; one drawback of HTTP deploys is that
they don't expose stdout for arbitrary messages like scp does. Worst
case is we special case it so it can check the URL being deployed to
and emit a message for Clojars.
> The main (and only?) benefit I see to staging to the existing
> repository rather than to an inaccessible temporary directory is to
> allow for an incremental migration. If one of your dependencies has
> issues blocking it from entering the more stringent repo, you can at
> least upload and you will be automatically promoted once upstream is
> fixed.
Yeah, I was thinking that not requiring any changes to the workflow
would be a big win, but it occurs to me that in Leiningen 2.0.0,
"clojars" will refer to the S3 repo, which can't be deployed directly
to. So I'm on the fence now about using the existing repo vs creating
a new one just for releases. It does seem like fewer moving parts, but
maybe it's making parts do double-duty in ways that are inappropriate.
Needs further thought.
> This does have the implication that releases are not immutable, at
> least in the staging repository. This is ok by me, it's been quietly
> the current policy and generally been more useful (fixing mistakes in
> metadata) than it has been painful. There has been some pain though.
> People often don't understand caching and are surprised when their
> change doesn't make it to clients.
If we go with promoting out of the existing repository, I think we
should save a bit for each jar for whether it's been promoted.
Re-deploying a jar that's been promoted should probably be disallowed,
even if it's to the old repo simply to avoid confusing scenarios.
On Sun, Jun 17, 2012 at 2:08 AM, Alex Osborne <a...@meshy.org> wrote:
> Upsides (of using the existing repo as staging):
> * it's visible
> * it doesn't break any existing workflows.
> * it allows for incremental migration
> Downsides:
> * it exposes incorrect jars/metadata
> * it means two sets of data need to be maintained over time
I'm not sure I follow; can you elaborate on this? Would a policy of
freezing a jar in the old repository once it's been promoted to the
new address it?
> The only scenario I can think of where we'd need multiple upload nodes
> is if we're doing some insane level of processing. Perhaps metadata
> extraction? Super fancy doc generation? But if we're blocking the
> upload process with something that expensive I think we're doing it
> wrong anyway.
Quite right; this is probably decades out and not worth worrying about.
> I think it's important this happens automatically and without any
> unnecessary delays (no "nightly" or "weekly" batch silliness). My
> understanding of Phil's proposal that it's triggered automatically
> when you (or one of your dependencies?) is pushed. Plus a once-off run
> against all the existing content when the change is introduced.
Definitely. I hadn't considered the dependencies case, I suppose it
doesn't make sense to promote a jar if it depends on jars that are
only found in the snapshots repo.
Phil Hagelberg <p...@hagelb.org> writes:
> On Sun, Jun 17, 2012 at 2:08 AM, Alex Osborne <a...@meshy.org> wrote:
>> Upsides (of using the existing repo as staging):
>> * it's visible
>> * it doesn't break any existing workflows.
>> * it allows for incremental migration
>> Downsides:
>> * it exposes incorrect jars/metadata
>> * it means two sets of data need to be maintained over time
> I'm not sure I follow; can you elaborate on this?
One of the possible gotchas I've been thinking about:
Assume a jar isn't initially promoted because it's incorrect or
incomplete in some way (bad metadata, not signed). You repush
the same version string with good metadata in order to fix it and then
the promotion happens.
But what about users who've already been exposed to the early push of
the jar via the moshpit? They're going to have the incorrect version
cached in ~/.m2.
So I've been looking into it a bit further and it sounds like this
behaviour was fixed in Aether:
I hope that's not saying that having an old version cached means Aether
starts claiming the jar is not found. That would really suck. Surely
if that's the case we would have run into it with the central
clojure.jar outage workaround though.
> Would a policy of freezing a jar in the old repository once it's been
> promoted to the new address it?
Phil Hagelberg <p...@hagelb.org> writes:
> What will be checked by default is still an open question, but without
> having signatures it's impossible to build a system which can be
> trusted. I don't believe simply supporting signatures is enough;
> rubygems supports signatures but nobody I've talked to is even aware
> of this fact since signing is not required.
I agree, but I would take that further and say that just requiring
signatures isn't enough either. How many people consistently check
Maven Central signatures -- even those who are deploying signed jars to
Central themselves?
> Of course, care should be taken for the user experience. Right now on
> the deploy-asc branch of Leiningen, each repository has a
> :sign-releases option. If it's set, then a signature for the jar and
> pom will be generated and deployed. If you have a key in place and
> your agent is set up, there is nothing more to do; it simply works. So
> I believe the only two issues are A) user education; ensuring people
> learn how to generate and protect their keys. But this is something
> all professional software developers _should_ already know how to do.
> Admittedly crypto literacy is not high among developers, but it needs
> to be regardless of Leiningen--it's not OK for professionals to remain
> ignorant of it. Perhaps practical objections ("people won't bother")
> can be raised here, but I don't buy the "it's too hard" argument.
Is installing GPG and learning it's CLI going to be a prerequisite to
using Leiningen? I argue that "crypto literacy" and knowing how to use
a particular tool are different things.
Linux users will generally already have it installed due to yum, apt and
friends being dependent on it. OS X and Windows users generally won't.
Even if they do have it installed I suspect there's a good chance it's
not going to be in $PATH (needs verification).
One of the reasons for the lein-clojars plugin is that Windows
doesn't ship an SSH client by default and SSH is not something most
Windows developers have any reason to be familiar with. Forcing Windows
developers to install and learn OpenSSH or PuTTY, which for them is an
obscure tool they may well never need otherwise is a bad user
experience.
Generally developers even if they don't use SSH understand the concept
of a public/private keypair used for authentication. They don't need to
learn the protocol details before getting started using it. They *do*
need to understand what lein-clojars "keygen" command is doing at a
conceptual level.
So maybe it's desirable to have a similar plugin that causes Lein to use
a copy of the Bouncy Castle PGP library bundled with the plugin if GPG
isn't installed?
> B) is stickier; apparently gpg-agent is not well-integrated with
> certain operating systems. On Gnome it will prompt you for your
> passphrase if the agent doesn't have it cached, but apparently Macs
> and Windows don't work that way out of the box. It looks like this can
> be addressed with http://www.funtoo.org/wiki/Keychain, but I haven't
> looked into it and the need for further third-party software to
> address OS shortcomings is unfortunate. Perhaps deploying is uncommon
> enough that entering the passphrase every time is OK.
Yes, I think it's OK. ... although maybe it'd be good not to require prompting for the passphrase twice per deploy (repo credentials *and*
signing key) if possible. I guess that's probably difficult though.
If we're going to require signing anyway, this did get me stumbling
across: http://gpgauth.org/
> Ideally I would like each project to be able to specify who they
> trust. There would probably be some keys trusted out of the box (you
> have to trust Leiningen itself, for instance) and the keys trusted by
> the authors of Leiningen itself seem like a good bet. But then
> projects should be able to specify how many steps away from a trusted
> source are still considered trusted (7 degrees of Kevin Bacon, etc)
> and add and revoke additional keys. Another setting would be to
> require more overlapping signatures for a key before it's considered
> trusted. I'd imagine corporate projects should want stricter control
> than open source community ones.
That took me a while to understand -- at first I was trying to figure
out why a project's trust paths might ever differ from its dependency
path. I wonder if that means I'm one of your "crypto illiterate". :)
(I've used GPG a little. I've used/admined SSH, SSL, LUKS and friends a
lot. I'm familiar conceptually with asymmetric vs symmetric encryption,
signatures, MACs, the web of trust, various ciphers and hashes etc.
Never had to cause to use a PGP-style web of trust model in practice
though so far.)
I think I'm finally seeing where you're heading with this and it
actually sounds quite exciting *if*:
* it can be pulled off without nagging people to the point where they
blindly trust everything just to get the warnings out of their face. I
agree that having good out of the box trust is absolutely crucial here.
* we can manage to get strong adoption which I think means tearing down
barriers like people having to stuff around installing GPG and agents.
That doesn't mean dumbing things down so that people don't have to
understand the concepts, it means making them convenient, especially
making them more convenient that any alternatives.
> So far everything I've read about GPG has been very user-centric
> rather than project-centric, but I think this should be doable. This
> is on the consuming side.
One interesting part is how a project with multiple people with release
access would work. Share the project's private key between them?
> On the publishing side, one way would be to only accept deploys signed
> by the original author or a key directly trusted by the author. Key
> rotation makes this interesting since the same author doesn't have the
> same key forever; they will definitely need to delegate this
> publishing trust in a way that we can distinguish from regular trust.
As well as losing keys, which is going to happen. For that reason alone
I'd be inclined to treat them much like how we do the SSH keys.
> * it can be pulled off without nagging people to the point where they
> blindly trust everything just to get the warnings out of their face. I
> agree that having good out of the box trust is absolutely crucial here.
Note that many places just require you to trust the repository, and
then the repository maintainers have revocation on the sub-certs. All
the App Stores work this way, and it seems a good middle ground. If a
bad actor is detected, clojars could pull the cert. It seems a little
more user friendly than having to expose the trust network. For
starters, I might decide I trust various people on the clojure mailing
list who post a lot, but there is no way for me to know that's who
published a jar unless they also sign their emails or I just trust the
author information.
On Mon, Jun 18, 2012 at 8:41 AM, Alex Osborne <a...@meshy.org> wrote:
> Is installing GPG and learning it's CLI going to be a prerequisite to
> using Leiningen? I argue that "crypto literacy" and knowing how to use
> a particular tool are different things.
It will only be necessary for certain operations. Right now those are:
0) deploy releases, 1) verify dependencies or 2) read private repository
credentials from disk.
An alternative for 2 would be to tie into OS level secret storage. I
don't think it helps with 0 and 1, so I'd rather get the portable
implementation right before worrying about this personally, but anyone
interested in this feature could certainly look into it. I recall there
being issues with granularity here on the OS X side: at least with Ruby
it doesn't distinguish between granting access to one ruby program vs
another. But that's such a huge leap over storing credentials in
plaintext that it probably shouldn't be a blocker.
> Generally developers even if they don't use SSH understand the concept
> of a public/private keypair used for authentication. They don't need to
> learn the protocol details before getting started using it. They *do*
> need to understand what lein-clojars "keygen" command is doing at a
> conceptual level.
> So maybe it's desirable to have a similar plugin that causes Lein to use
> a copy of the Bouncy Castle PGP library bundled with the plugin if GPG
> isn't installed?
Interesting idea. I looked into Bouncy Castle and it claimed to
implement signing and verification but not in a way that integrated with
the agent, so I didn't look much further because always getting prompted
is a lousy experience. But it could serve as a fallback considering we
can't really rely on the agent consistently anyway.
> If we're going to require signing anyway, this did get me stumbling
> across: http://gpgauth.org/
Wow, looks interesting. I've wished for ages to be able to authenticate
in the browser with my SSH pubkey, but this would do the trick too.
> (I've used GPG a little. I've used/admined SSH, SSL, LUKS and friends a
> lot. I'm familiar conceptually with asymmetric vs symmetric encryption,
> signatures, MACs, the web of trust, various ciphers and hashes etc.
> Never had to cause to use a PGP-style web of trust model in practice
> though so far.)
Right; as far as I know the only systems that get this right are apt and
yum, and they have an arguably centralized model of trust. Maven is the
closest anyone's gotten outside an OS, but I think they're just too vast
of a community to pull it off; it's difficult to build a web of trust
when the requirement for signing was added so late.
I think the Clojure community has a chance to do better because we have
a huge head start by building on Maven, but we're still small enough
that most library authors know each other and even meet face-to-face
from time to time. But it is attempting to do something that hasn't been
done before, so there's a quixotic element to the whole plan.
> I think I'm finally seeing where you're heading with this and it
> actually sounds quite exciting *if*:
> * it can be pulled off without nagging people to the point where they
> blindly trust everything just to get the warnings out of their face. I
> agree that having good out of the box trust is absolutely crucial here.
This will require having unsigned dependencies be a rarity. It will
depend on getting the community on board. I'm thinking of submitting a
talk to the Conj on the topic; we'll see how that goes. But it
definitely needs the experience to be as seamless as possible.
> * we can manage to get strong adoption which I think means tearing down
> barriers like people having to stuff around installing GPG and agents.
> That doesn't mean dumbing things down so that people don't have to
> understand the concepts, it means making them convenient, especially
> making them more convenient that any alternatives.
Yeah, I think I was a little hasty with my first experiments; I was able
to get a very smooth out-of-the-box experience on Gnome, but I didn't
realize that we can't count on that level of seamless integration on
Macs. I suppose it might really come down to how good the Bouncy Castle
implementation is.
>> So far everything I've read about GPG has been very user-centric
>> rather than project-centric, but I think this should be doable. This
>> is on the consuming side.
> One interesting part is how a project with multiple people with release
> access would work. Share the project's private key between them?
I think so, but I haven't thought through the implications all the way.
>> On the publishing side, one way would be to only accept deploys signed
>> by the original author or a key directly trusted by the author. Key
>> rotation makes this interesting since the same author doesn't have the
>> same key forever; they will definitely need to delegate this
>> publishing trust in a way that we can distinguish from regular trust.
> As well as losing keys, which is going to happen. For that reason alone
> I'd be inclined to treat them much like how we do the SSH keys.
True. That means initially an attacker getting access to Clojars
credentials would allow them to publish trusted artifacts, but as soon
as it was detected the key could be revoked. What GitHub does is send
out emails whenever a new SSH pubkey is added to your account; we could
probably do the same to detect intrusions more quickly.
> Interesting idea. I looked into Bouncy Castle and it claimed to
> implement signing and verification but not in a way that integrated with
> the agent, so I didn't look much further because always getting prompted
> is a lousy experience. But it could serve as a fallback considering we
> can't really rely on the agent consistently anyway.
Not sure how much point there'd be in using that though: if there's an agent running, you're going to have gpg installed anyway.
>> As well as losing keys, which is going to happen. For that reason alone
>> I'd be inclined to treat them much like how we do the SSH keys.
> True. That means initially an attacker getting access to Clojars
> credentials would allow them to publish trusted artifacts, but as soon
> as it was detected the key could be revoked. What GitHub does is send
> out emails whenever a new SSH pubkey is added to your account; we could
> probably do the same to detect intrusions more quickly.
It would allow the attacker to publish artifacts, but not trusted ones though unless there's a trust path to the attacker's key? Isn't the whole point of signing exercise that if Clojars is compromised or MITM'd all is not lost? :)
> Not sure how much point there'd be in using that though: if there's an agent
> running, you're going to have gpg installed anyway.
Hm; well it might let us simplify and only have a single code path vs
shelling out when we can and falling back to in-process calls
otherwise.
>> True. That means initially an attacker getting access to Clojars
>> credentials would allow them to publish trusted artifacts, but as soon
>> as it was detected the key could be revoked. What GitHub does is send
>> out emails whenever a new SSH pubkey is added to your account; we could
>> probably do the same to detect intrusions more quickly.
> It would allow the attacker to publish artifacts, but not trusted ones
> though unless there's a trust path to the attacker's key? Isn't the whole
> point of signing exercise that if Clojars is compromised or MITM'd all is
> not lost? :)
On Mon, Jun 18, 2012 at 8:55 AM, Jack Moffitt <j...@metajack.im> wrote:
> Note that many places just require you to trust the repository, and
> then the repository maintainers have revocation on the sub-certs. All
> the App Stores work this way, and it seems a good middle ground. If a
> bad actor is detected, clojars could pull the cert. It seems a little
> more user friendly than having to expose the trust network. For
> starters, I might decide I trust various people on the clojure mailing
> list who post a lot, but there is no way for me to know that's who
> published a jar unless they also sign their emails or I just trust the
> author information.
Yeah, this is common among centralized schemes, but I don't want to
get into the business of policing content. We don't have anyone
working on this full-time, and as soon as we get into that game I
think people are going to start expecting a more active role from us.
Also with the current proposal you don't even have to trust Clojars;
if Clojars were compromised it wouldn't be possible to attack user
builds.
Every file that gets deployed gets placed on an in-memory
LinkedBlockingQueue, checked for blockers, and promoted to S3 if none
are found.
A few TODOs and open questions:
* Checks for signatures are unimplemented; should be easy to take code
from Leiningen for this unless we want to investigate bouncycastle.
* I think we'd want to set up periodic re-processing of unpromoted
artifacts in order to catch ones that could fall through the cracks,
for example from having the JVM restarted before the promotion queue
is empty. I think this lets us off the hook for having stateful
in-memory queues, but we'll have to see how practical it is in
practice. I was thinking of exposing a web endpoint to initiate queueing
up all unpromoted non-snapshots and hitting it via curl from a cron job.
* We need to expose the list of blockers for a given deployment's
promotion in the web interface, probably only to owners of a given
jar.
* I'm not sure the best way to store AWS credentials during development.
We don't want them checked into dev-resources/config.clj, so I've added
"local-resources" to :resource-paths in the dev profile.
* Do we have a name for the list of all artifacts involved in a single
deployment? I think the term "jars" is used for that in the codebase
currently, but it's not quite the right word.
Of course everything here is still open for discussion, but I thought
having some code to discuss might be helpful. I think the signature
checks are really the only thing preventing this from being usable as a
proof-of-concept.
Phil Hagelberg <p...@hagelb.org> writes:
> * I think we'd want to set up periodic re-processing of unpromoted
> artifacts in order to catch ones that could fall through the cracks,
> for example from having the JVM restarted before the promotion queue
> is empty. I think this lets us off the hook for having stateful
> in-memory queues, but we'll have to see how practical it is in
> practice. I was thinking of exposing a web endpoint to initiate queueing
> up all unpromoted non-snapshots and hitting it via curl from a cron job.
Would the set of unpromoted artifacts be bounded, or will this grow over time?
> * I'm not sure the best way to store AWS credentials during development.
> We don't want them checked into dev-resources/config.clj, so I've added
> "local-resources" to :resource-paths in the dev profile.
Having the credentials in a configuration file would seem safer to me,
so you don't have jars on disc containing your credentials. I guess the
"local-resources" directory could just be excluded from the jar.
How would this work in production - would you start the process with an
extra classpath argument pointing to the location of the production
credentials?
> * Do we have a name for the list of all artifacts involved in a single
> deployment? I think the term "jars" is used for that in the codebase
> currently, but it's not quite the right word.
In my last mail I said signatures were the one thing blocking using it
as a proof-of-concept, but flipping a "promoted" flag and having that
cause the snapshots repo to refuse overwriting deploys also needs
to be implemented.
Hugo Duncan <h...@hugoduncan.org> writes:
> Would the set of unpromoted artifacts be bounded, or will this grow over time?
Maybe we could have the hourly cron job only act upon the last 24 hours
or so worth of artifacts, and if that turns out to be insufficient we
could add a weekly one for the entire set. If it takes a while it would
be easy to have a foreground and background queue so direct handling of
uploads wouldn't get clogged up with the background processing.
> Having the credentials in a configuration file would seem safer to me,
> so you don't have jars on disc containing your credentials. I guess the
> "local-resources" directory could just be excluded from the jar.
Yeah, keeping it in the :dev profile takes care of keeping it out of jars.
> How would this work in production - would you start the process with an
> extra classpath argument pointing to the location of the production
> credentials?
We already have a production-specific config file, so it's just a matter
of adding the credentials there.
>> * Do we have a name for the list of all artifacts involved in a single
>> deployment? I think the term "jars" is used for that in the codebase
>> currently, but it's not quite the right word.
> release artifacts?
Hm; so the abstract grouping of all artifacts with a given
group/artifact/version is a "release"? But that implies non-snapshots,
which isn't always the case. Maybe a "deployment"?