Proposal for next-generation Clojars

Phil Hagelberg

unread,

Mar 9, 2012, 11:09:38 PM3/9/12

to clojars-m...@googlegroups.com

Hello folks.

Now that the bcrypt functionality[1] has been deployed, I'd like to
move forward with a further proposal for a revamped Clojars. I'd be
happy to get your opinions on the following.

thanks,
Phil

[1] - https://groups.google.com/group/clojure/browse_thread/thread/5e0d48d2b82df39b

# Proposal for Next-Generation Clojars

Clojars should be a frontend for deploying jars directly to a
repository hosted on S3 for improved redundancy and reliability.

## Separate Repositories

It's desirable to have as few snapshots repositories in your list as
possible in order to speed up dependency resolution. In addition, the
presence of snapshots introduces undesirable semantics for version
ranges. Because of this, the final release of Leiningen 2 will not
have any snapshot repositories in its default list. This means Clojars
will need to serve a separate repository to segregate releases from
snapshots.

Rather than the Maven world's traditional snapshots/releases split,
the original repository will be left alone as an "anything goes" repo
while the new next-generation repository would only accept releases.

## Maintaining Backwards Compatibility

Leiningen 1 will only check the old repository, but a redirect rule
can be added to it so that artifacts that are not found can fall
through to the new repository. Some restrictions may be required here;
for instance artifacts that have had pushes to the new repository
should not accept uploads to the old anymore.

In addition, compatibility needs to go the other way. At the time the
new repository is launched, it should be seeded with all non-snapshot
artifacts from the old that are grandfathered in. The two repositories
should share a single user database and group permissions list.

## Requirements

The new repository will not accept snapshots. We may also want to
impose additional requirements on uploads. One possibility would be to
require signing of jars. We need to do some investigation around how
this can be done with the minimum hassle as well as discussing
further policies that can be built around it.

Another possibility is to require more elements of the pom to be
filled out. In particular, fields of interest include description,
URL, licenses, and SCM availability. Having a richer set of metadata
available could prove valuable down the line.

## Uploads

Uploads to the new repository should be done over HTTP using the
standard Maven deployment mechanism. This will allow greater tooling
interoperability as well as reusing existing code and shelving the
custom SCP uploader. Jars once uploaded should be stored on S3 so that
downtime of the machines hosting the Clojars application will not
affect availability of jars. It will also make setting up a Clojars
instance for development much simpler since everything can be done
in-process.

## Migration

The first step is to migrate the database to PostgreSQL. Then both the
old and new repositories can share a database. The once the HTTP
upload functionality is finished, the new repository should be seeded
with releases from the old, and it can start accepting new uploads.

Phil Hagelberg

unread,

Mar 9, 2012, 11:19:27 PM3/9/12

to clojars-m...@googlegroups.com

On Fri, Mar 9, 2012 at 8:09 PM, Phil Hagelberg <ph...@hagelb.org> wrote:
> ## Uploads
>
> Uploads to the new repository should be done over HTTP using the
> standard Maven deployment mechanism.

In the interests of full disclosure I should mention that we are
planning on running the new Clojars repository on Heroku.

-Phil

Arthur D. Edelstein

unread,

Mar 10, 2012, 2:41:42 AM3/10/12

to clojars-m...@googlegroups.com

Hi Phil,

I think you're doing great work here!

I have a couple of comments.

Rather than the Maven world's traditional snapshots/releases split,
the original repository will be left alone as an "anything goes" repo
while the new next-generation repository would only accept releases.

I like this plan and would suggest that in the new repo, jars with duplicate version numbers should not be permitted. For example, this directory shows several jars with the same version number:

http://clojars.org/repo/aleph/core/0.6.0-SNAPSHOT/

I also experimented and noticed that if I use "lein push" multiple times with the same (release) version number then I overwrite a jar already on clojars. I think the new Clojars should instead have a policy of immutable dependencies, so that users of these libraries can be sure the bytecode won't change. Probably requiring monotonicity in version numbers is also preferable.

The new repository will not accept snapshots. We may also want to
impose additional requirements on uploads. One possibility would be to
require signing of jars.

I particularly like this idea (or at least making signing an easy feature), because use of third party dependencies is based in part on the reputation of the authors.

Another possibility is to require more elements of the pom to be
filled out. In particular, fields of interest include description,
URL, licenses, and SCM availability. Having a richer set of metadata
available could prove valuable down the line.

What about linking the jars (in an automated way) to github? If lein/clojars could easily match artifacts to a github commit, that would perhaps serve as a source of rich metadata.

I was also quite intrigued by the suggestion in Clojurescript One that was forked to this plugin:

https://github.com/tobyhede/lein-git-deps

The idea that git urls could be a useful way to manage source dependencies. I'm naively wondering if it might be good for the new Clojars to index dependency releases hosted on github.

Best regards,

Arthur

Phil Hagelberg

unread,

Mar 10, 2012, 11:03:59 AM3/10/12

to clojars-m...@googlegroups.com

"Arthur D. Edelstein" <arthure...@gmail.com> writes:

> I like this plan and would suggest that in the new repo, jars with
> duplicate version numbers should not be permitted. For example, this
> directory shows several jars with the same version number:
> http://clojars.org/repo/aleph/core/0.6.0-SNAPSHOT/

This should be taken care of by disallowing snapshots, actually.

> I also experimented and noticed that if I use "lein push" multiple
> times with the same (release) version number then I overwrite a jar
> already on clojars. I think the new Clojars should instead have a
> policy of immutable dependencies, so that users of these libraries
> can be sure the bytecode won't change. Probably requiring
> monotonicity in version numbers is also preferable.

Good idea; this should be explicitly enforced.

> What about linking the jars (in an automated way) to github? If lein/
> clojars could easily match artifacts to a github commit, that would
> perhaps serve as a source of rich metadata.

The poms generated by Leiningen will automatically contain links to the
git repository if it was generated from a project that's a git checkout.
But perhaps we should go the extra measure and require this.

> I was also quite intrigued by the suggestion in Clojurescript One
> that was forked to this plugin:
> https://github.com/tobyhede/lein-git-deps
> The idea that git urls could be a useful way to manage source
> dependencies. I'm naively wondering if it might be good for the new
> Clojars to index dependency releases hosted on github.

Keeping the scm metadata around is a fine idea. The idea of generating a
repo full of jars based on tags found in registered project git repos is
one that's occurred to me, (indeed, this is what
http://melpa.milkbox.net/ does) but I think that would be a different
kind of repository from Clojars.

-Phil

Thomas Engelschmidt

unread,

Mar 10, 2012, 11:09:22 AM3/10/12

to clojars-m...@googlegroups.com

Hi,

One of the features that the team i'm working with really like is uploading new artifacts with scp :-);

its so simple just scp x.jar pom.xml clojars.org:.

I agree that support for the standard way of uploading mvn would be great.

/zamaterian

Nelson Morris

unread,

Mar 10, 2012, 1:36:22 PM3/10/12

to clojars-m...@googlegroups.com

Do separate repos get more then just shorter defaults? Does

{"clojars-releases" {:url http://clojars.org/repo :snapshots false}}

or

<repositories>
<repository>
<id>clojars-releases</id>
<url>http://clojars.org/repo</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>

achieve the same things?

> ## Requirements
...

> Another possibility is to require more elements of the pom to be
> filled out. In particular, fields of interest include description,
> URL, licenses, and SCM availability. Having a richer set of metadata
> available could prove valuable down the line.

I'm a fan of using more info from the pom. Maybe not of requiring it.
Looking at a page like http://rubygems.org/gems/rails and being able
to see all the places to find info is nice.

>
> ## Uploads
>
> Uploads to the new repository should be done over HTTP using the
> standard Maven deployment mechanism. This will allow greater tooling
> interoperability as well as reusing existing code and shelving the
> custom SCP uploader. Jars once uploaded should be stored on S3 so that
> downtime of the machines hosting the Clojars application will not
> affect availability of jars. It will also make setting up a Clojars
> instance for development much simpler since everything can be done
> in-process.

+1 for http uploads and s3. The http uploads will make it much easier
to do testing then faking scp.

>
> ## Migration
>
> The first step is to migrate the database to PostgreSQL. Then both the
> old and new repositories can share a database. The once the HTTP
> upload functionality is finished, the new repository should be seeded
> with releases from the old, and it can start accepting new uploads.

What about search? Currently it is some sqlite queries. Are there
plans to continue that in postgres for awhile? Any thoughts on lucene
or since it will be on heroku using
https://addons.heroku.com/searchify ?

Phil Hagelberg

unread,

Mar 10, 2012, 11:44:14 PM3/10/12

to clojars-m...@googlegroups.com

On Sat, Mar 10, 2012 at 10:36 AM, Nelson Morris
<nmo...@nelsonmorris.net> wrote:
> Do separate repos get more then just shorter defaults? Does
>
> {"clojars-releases" {:url http://clojars.org/repo :snapshots false}}

I think it would have to live at a different URL. It could be the same
domain though.

> I'm a fan of using more info from the pom. Maybe not of requiring it.
> Looking at a page like http://rubygems.org/gems/rails and being able
> to see all the places to find info is nice.

I think requiring a license and URL is pretty reasonable. Description
helps a lot for search results, but I could budge on it. On the other
hand it doesn't take long to fill out. We should also check for the
FIXME defaults that Leiningen places in new project skeletons and
reject those.

> What about search? Currently it is some sqlite queries. Are there
> plans to continue that in postgres for awhile? Any thoughts on lucene
> or since it will be on heroku using https://addons.heroku.com/searchify ?

I was originally thinking Lucene since we've already got the code for
searching that in lein search, but I think it's up to whomever
implements it. Postgres full-text search would have the advantage of
living in the database, and thus it wouldn't have to be rebuilt when a
new node is spun up. Fewer moving parts is always nice.

-Phil

Arthur D. Edelstein

unread,

Mar 11, 2012, 4:03:41 PM3/11/12

to clojars-m...@googlegroups.com

> What about search? Currently it is some sqlite queries. Are there
> plans to continue that in postgres for awhile? Any thoughts on lucene
> or since it will be on heroku using https://addons.heroku.com/searchify ?

I was originally thinking Lucene since we've already got the code for
searching that in lein search, but I think it's up to whomever
implements it. Postgres full-text search would have the advantage of
living in the database, and thus it wouldn't have to be rebuilt when a
new node is spun up. Fewer moving parts is always nice.

Since search on clojars is being discussed, I should mention that I am currently developing a website that lets one search for clojure vars. Clojure source (.clj) files are extracted from jar files, parsed for vars (functions, macros, etc.) and then indexed by the solr/lucene search engine. I haven't worked on a proper front end yet, but you can get the idea:

http://collaj.net

Currently the search database includes the latest release of every artifact in Clojars. I'm hoping in the future to extend it to maven central and possibly github as well. I can see that project descriptions is something I should add to the search.

Any suggestions or comments would be welcome at this early stage. Thanks!

Arthur

Phil Hagelberg

unread,

Jun 13, 2012, 10:49:54 PM6/13/12

to clojars-m...@googlegroups.com

After some discussion on IRC I've come up with a slightly different
proposal for a next-gen repository. It involves releases being
deployed to the existing repository and promoted to the releases repo
on S3 only when they pass the necessary qualifications.

* Library author invokes `lein deploy clojars` on a non-snapshot version.
* Leiningen invokes gpg to sign both the jar and the pom.
* Leiningen transmits the jar, pom, and two signatures over HTTP.
* Clojars accepts each file.
* After each file is transmitted, the set of artifacts as a whole is
evaluated to see if it qualifies for promotion to the releases
repository. (signatures match, pom metadata is found)
* If so, they're all uploaded to S3.
* If not, a list of reasons why it doesn't qualify is available to the
owner through the web UI.

Part of the reason this is preferable to deploying directly to the
releases repository is that Aether has no notion of transactionality,
so it must accept each jar and pom file before it knows whether they
are really qualified to be deployed. We could fake transactionality on
top of that, but if we're going to need a state machine then we might
as well use the state of the existing repository.

I think this will turn out to be much simpler than the original
proposal since it still only involves a single Clojars instance. We
wouldn't even need to port it off SQLite, but we would still get the
benefits of redundancy from S3.

Would like to hear your thoughts on this approach.

thanks,
Phil

Paul Stadig

unread,

Jun 14, 2012, 3:10:17 PM6/14/12

to clojars-m...@googlegroups.com

Couldn't you achieve the same effect by using a staging area and verification process, before uploading to S3?

Is the assumption that every deployment to the existing repository would automatically get promoted, or would there be some kind of lever that people could pull? I didn't catch that from your description.

If the assumption is that everything is going to be auto promoted, then you could use a staging area to fake transactionality without going through an extra repo.

Paul

Phil Hagelberg

unread,

Jun 14, 2012, 4:19:13 PM6/14/12

to clojars-m...@googlegroups.com

On Thu, Jun 14, 2012 at 12:10 PM, Paul Stadig <pa...@stadig.name> wrote:
> Couldn't you achieve the same effect by using a staging area and
> verification process, before uploading to S3?

That would probably achieve the same effect, but inventing a new place
to store state makes things more complicated, especially for the
future when we have to scale beyond one node. In this proposal the
existing repository *is* the staging area.

> Is the assumption that every deployment to the existing repository would
> automatically get promoted, or would there be some kind of lever that people
> could pull? I didn't catch that from your description.

Every deployment would automatically be considered for promotion, but
to be promoted they will have to qualify first by 0) not being a
snapshot, 1) declaring all required metadata, and 2) including
signatures that verify for both the pom and the jar.

-Phil

Alex Osborne

unread,

Jun 17, 2012, 4:38:15 AM6/17/12

to clojars-maintainers

On Jun 14, 12:49 pm, Phil Hagelberg <p...@hagelb.org> wrote:

> After some discussion on IRC I've come up with a slightly different
> proposal for a next-gen repository. It involves releases being
> deployed to the existing repository and promoted to the releases repo
> on S3 only when they pass the necessary qualifications.
>
> * Library author invokes `lein deploy clojars` on a non-snapshot version.
> * Leiningen invokes gpg to sign both the jar and the pom.

I'm concerned that if we start enforcing it, we ensure the jar signing
procedure will be straightforward for people who do not use gpg
frequently (and I would expect a large proportion of the community are
in that category). As far as I can tell enforcing jar-signing
currently introduces little benefit as tools currently do not check
the signatures.

Clojars' only reason for existence is to address usability problems
with Maven Central and lower the barrier to entry. If we reach a point
(either through us changing or Central changing) where pushing to
Clojars is equally onerous as pushing to Central, we should stop
accepting uploads and instead become a discovery tool for finding
Clojure libraries in Central. Otherwise we're just needlessly
fragmenting.

I would like to know what ideas people have for how jar-signing is
actually to be enforced and used. There seems to me very little point
in signatures unless you actually check the signer against a list.
That's relatively easy in a conventional Linux distro setting with a
coordinated release team, not so easy in our case unless I'm missing
something.

I do think signing is a good idea, I just think we should have a
better idea of where we're going with it in order to help make
decisions like this on how to transition. This may well have been
covered in IRC conversations that I missed.

> * Leiningen transmits the jar, pom, and two signatures over HTTP.
> * Clojars accepts each file.
> * After each file is transmitted, the set of artifacts as a whole is
> evaluated to see if it qualifies for promotion to the releases
> repository. (signatures match, pom metadata is found)
> * If so, they're all uploaded to S3.
> * If not, a list of reasons why it doesn't qualify is available to the
> owner through the web UI.

> Part of the reason this is preferable to deploying directly to the
> releases repository is that Aether has no notion of transactionality,
> so it must accept each jar and pom file before it knows whether they
> are really qualified to be deployed. We could fake transactionality on
> top of that, but if we're going to need a state machine then we might
> as well use the state of the existing repository.

Right. The trouble with the HTTP upload mechanism is it's difficult to
validate and difficult to give feedback.

I do like the idea of being able to view the history and status of
your recent pushes through the web UI. Although if possible Leiningen
should print it immediately as well. Perhaps printing a link to the
status page would be enough.

The main (and only?) benefit I see to staging to the existing
repository rather than to an inaccessible temporary directory is to
allow for an incremental migration. If one of your dependencies has
issues blocking it from entering the more stringent repo, you can at
least upload and you will be automatically promoted once upstream is
fixed.

This does have the implication that releases are not immutable, at
least in the staging repository. This is ok by me, it's been quietly
the current policy and generally been more useful (fixing mistakes in
metadata) than it has been painful. There has been some pain though.
People often don't understand caching and are surprised when their
change doesn't make it to clients.

> I think this will turn out to be much simpler than the original
> proposal since it still only involves a single Clojars instance. We
> wouldn't even need to port it off SQLite, but we would still get the
> benefits of redundancy from S3.

I am very much in favour of smaller incremental changes than trying to
tie multiple changes together.

Alex Osborne

unread,

Jun 17, 2012, 5:08:11 AM6/17/12

to clojars-maintainers

On Jun 15, 6:19 am, Phil Hagelberg <p...@hagelb.org> wrote:

> On Thu, Jun 14, 2012 at 12:10 PM, Paul Stadig <p...@stadig.name> wrote:
> > Couldn't you achieve the same effect by using a staging area and
> > verification process, before uploading to S3?
>
> That would probably achieve the same effect, but inventing a new place
> to store state makes things more complicated,

Upsides (of using the existing repo as staging):
* it's visible
* it doesn't break any existing workflows.
* it allows for incremental migration

Downsides:
* it exposes incorrect jars/metadata
* it means two sets of data need to be maintained over time

> especially for the future when we have to scale beyond one node.

The only scenario I can think of where we'd need multiple upload nodes
is if we're doing some insane level of processing. Perhaps metadata
extraction? Super fancy doc generation? But if we're blocking the
upload process with something that expensive I think we're doing it
wrong anyway.

> In this proposal the
> existing repository *is* the staging area.
>
> > Is the assumption that every deployment to the existing repository would
> > automatically get promoted, or would there be some kind of lever that people
> > could pull? I didn't catch that from your description.
>
> Every deployment would automatically be considered for promotion, but
> to be promoted they will have to qualify first by 0) not being a
> snapshot, 1) declaring all required metadata, and 2) including
> signatures that verify for both the pom and the jar.

I think it's important this happens automatically and without any
unnecessary delays (no "nightly" or "weekly" batch silliness). My
understanding of Phil's proposal that it's triggered automatically
when you (or one of your dependencies?) is pushed. Plus a once-off run
against all the existing content when the change is introduced.

Phil Hagelberg

unread,

Jun 18, 2012, 1:30:02 AM6/18/12

to clojars-m...@googlegroups.com

You raise some good points as always. Some more context is definitely needed.

On Sun, Jun 17, 2012 at 1:38 AM, Alex Osborne <a...@meshy.org> wrote:
> I'm concerned that if we start enforcing it, we ensure the jar signing
> procedure will be straightforward for people who do not use gpg
> frequently (and I would expect a large proportion of the community are
> in that category). As far as I can tell enforcing jar-signing
> currently introduces little benefit as tools currently do not check
> the signatures.

Indeed, I've been thinking of signatures as the foundation on which
further security measures can be laid. It's necessary, but certainly
not sufficient on its own. That said, we currently have rudimentary
signature checks implemented on Leiningen master:
http://p.hagelb.org/deps-verify.html This only gives us a binary y/n
as far as whether the jars are signed by someone with a key on MIT's
public key server, but it can be expanded to determine whether there's
a valid trust path from your trusted keys to the signer of the
artifact.

What will be checked by default is still an open question, but without
having signatures it's impossible to build a system which can be
trusted. I don't believe simply supporting signatures is enough;
rubygems supports signatures but nobody I've talked to is even aware
of this fact since signing is not required. I expect a typical
deployment would look something like `lein do deps :verify, test,
uberjar`. A bit of background reading:
http://branchandbound.net/blog/security/2012/03/crossbuild-injection-how-safe-is-your-build/
(followed of course by http://p.hagelb.org/scared.gif)

Of course, care should be taken for the user experience. Right now on
the deploy-asc branch of Leiningen, each repository has a
:sign-releases option. If it's set, then a signature for the jar and
pom will be generated and deployed. If you have a key in place and
your agent is set up, there is nothing more to do; it simply works. So
I believe the only two issues are A) user education; ensuring people
learn how to generate and protect their keys. But this is something
all professional software developers _should_ already know how to do.
Admittedly crypto literacy is not high among developers, but it needs
to be regardless of Leiningen--it's not OK for professionals to remain
ignorant of it. Perhaps practical objections ("people won't bother")
can be raised here, but I don't buy the "it's too hard" argument.

B) is stickier; apparently gpg-agent is not well-integrated with
certain operating systems. On Gnome it will prompt you for your
passphrase if the agent doesn't have it cached, but apparently Macs
and Windows don't work that way out of the box. It looks like this can
be addressed with http://www.funtoo.org/wiki/Keychain, but I haven't
looked into it and the need for further third-party software to
address OS shortcomings is unfortunate. Perhaps deploying is uncommon
enough that entering the passphrase every time is OK.

> Clojars' only reason for existence is to address usability problems
> with Maven Central and lower the barrier to entry. If we reach a point
> (either through us changing or Central changing) where pushing to
> Clojars is equally onerous as pushing to Central, we should stop
> accepting uploads and instead become a discovery tool for finding
> Clojure libraries in Central. Otherwise we're just needlessly
> fragmenting.

Indeed, we need to take care. Clojars is still much more open in that
releases don't require human intervention to push out and there are no
requirements for verbose domain-centric group IDs. And the existing
repo isn't going anywhere either, other than out of the defaults list.

> I would like to know what ideas people have for how jar-signing is
> actually to be enforced and used. There seems to me very little point
> in signatures unless you actually check the signer against a list.
> That's relatively easy in a conventional Linux distro setting with a
> coordinated release team, not so easy in our case unless I'm missing
> something.

Ideally I would like each project to be able to specify who they
trust. There would probably be some keys trusted out of the box (you
have to trust Leiningen itself, for instance) and the keys trusted by
the authors of Leiningen itself seem like a good bet. But then
projects should be able to specify how many steps away from a trusted
source are still considered trusted (7 degrees of Kevin Bacon, etc)
and add and revoke additional keys. Another setting would be to
require more overlapping signatures for a key before it's considered
trusted. I'd imagine corporate projects should want stricter control
than open source community ones.

So far everything I've read about GPG has been very user-centric
rather than project-centric, but I think this should be doable. This
is on the consuming side.

On the publishing side, one way would be to only accept deploys signed
by the original author or a key directly trusted by the author. Key
rotation makes this interesting since the same author doesn't have the
same key forever; they will definitely need to delegate this
publishing trust in a way that we can distinguish from regular trust.

We definitely need more discussion around these policies; nothing's
been decided. We're still laying the foundation.

> I do like the idea of being able to view the history and status of
> your recent pushes through the web UI. Although if possible Leiningen
> should print it immediately as well. Perhaps printing a link to the
> status page would be enough.

We'll need to look into this; one drawback of HTTP deploys is that
they don't expose stdout for arbitrary messages like scp does. Worst
case is we special case it so it can check the URL being deployed to
and emit a message for Clojars.

> The main (and only?) benefit I see to staging to the existing
> repository rather than to an inaccessible temporary directory is to
> allow for an incremental migration. If one of your dependencies has
> issues blocking it from entering the more stringent repo, you can at
> least upload and you will be automatically promoted once upstream is
> fixed.

Yeah, I was thinking that not requiring any changes to the workflow
would be a big win, but it occurs to me that in Leiningen 2.0.0,
"clojars" will refer to the S3 repo, which can't be deployed directly
to. So I'm on the fence now about using the existing repo vs creating
a new one just for releases. It does seem like fewer moving parts, but
maybe it's making parts do double-duty in ways that are inappropriate.
Needs further thought.

> This does have the implication that releases are not immutable, at
> least in the staging repository. This is ok by me, it's been quietly
> the current policy and generally been more useful (fixing mistakes in
> metadata) than it has been painful. There has been some pain though.
> People often don't understand caching and are surprised when their
> change doesn't make it to clients.

If we go with promoting out of the existing repository, I think we
should save a bit for each jar for whether it's been promoted.
Re-deploying a jar that's been promoted should probably be disallowed,
even if it's to the old repo simply to avoid confusing scenarios.

-Phil

Phil Hagelberg

unread,

Jun 18, 2012, 1:34:44 AM6/18/12

to clojars-m...@googlegroups.com

On Sun, Jun 17, 2012 at 2:08 AM, Alex Osborne <a...@meshy.org> wrote:
> Upsides (of using the existing repo as staging):
> * it's visible
> * it doesn't break any existing workflows.
> * it allows for incremental migration
>
> Downsides:
> * it exposes incorrect jars/metadata
> * it means two sets of data need to be maintained over time

I'm not sure I follow; can you elaborate on this? Would a policy of
freezing a jar in the old repository once it's been promoted to the
new address it?

> The only scenario I can think of where we'd need multiple upload nodes
> is if we're doing some insane level of processing. Perhaps metadata
> extraction? Super fancy doc generation? But if we're blocking the
> upload process with something that expensive I think we're doing it
> wrong anyway.

Quite right; this is probably decades out and not worth worrying about.

> I think it's important this happens automatically and without any
> unnecessary delays (no "nightly" or "weekly" batch silliness). My
> understanding of Phil's proposal that it's triggered automatically
> when you (or one of your dependencies?) is pushed. Plus a once-off run
> against all the existing content when the change is introduced.

Definitely. I hadn't considered the dependencies case, I suppose it
doesn't make sense to promote a jar if it depends on jars that are
only found in the snapshots repo.

-Phil

Alex Osborne

unread,

Jun 18, 2012, 8:10:19 AM6/18/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> On Sun, Jun 17, 2012 at 2:08 AM, Alex Osborne <a...@meshy.org> wrote:
>> Upsides (of using the existing repo as staging):
>> * it's visible
>> * it doesn't break any existing workflows.
>> * it allows for incremental migration
>>
>> Downsides:
>> * it exposes incorrect jars/metadata
>> * it means two sets of data need to be maintained over time
>
> I'm not sure I follow; can you elaborate on this?

One of the possible gotchas I've been thinking about:

Assume a jar isn't initially promoted because it's incorrect or
incomplete in some way (bad metadata, not signed). You repush
the same version string with good metadata in order to fix it and then
the promotion happens.

But what about users who've already been exposed to the early push of
the jar via the moshpit? They're going to have the incorrect version
cached in ~/.m2.

So I've been looking into it a bit further and it sounds like this
behaviour was fixed in Aether:

https://cwiki.apache.org/MAVEN/maven-3x-compatibility-notes.html#Maven3.xCompatibilityNotes-ResolutionfromLocalRepository

This may imply we need to be careful to use a different "repoId" for
moshpit and releases and not just change the url.

There's also this:

http://jira.codehaus.org/browse/MNG-5185

I hope that's not saying that having an old version cached means Aether
starts claiming the jar is not found. That would really suck. Surely
if that's the case we would have run into it with the central
clojure.jar outage workaround though.

> Would a policy of freezing a jar in the old repository once it's been
> promoted to the new address it?

Right, makes sense.

Alex Osborne

unread,

Jun 18, 2012, 11:41:50 AM6/18/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> What will be checked by default is still an open question, but without
> having signatures it's impossible to build a system which can be
> trusted. I don't believe simply supporting signatures is enough;
> rubygems supports signatures but nobody I've talked to is even aware
> of this fact since signing is not required.

I agree, but I would take that further and say that just requiring
signatures isn't enough either. How many people consistently check
Maven Central signatures -- even those who are deploying signed jars to
Central themselves?

> A bit of background reading:
> http://branchandbound.net/blog/security/2012/03/crossbuild-injection-how-safe-is-your-build/
> (followed of course by http://p.hagelb.org/scared.gif)

Yes.

> Of course, care should be taken for the user experience. Right now on
> the deploy-asc branch of Leiningen, each repository has a
> :sign-releases option. If it's set, then a signature for the jar and
> pom will be generated and deployed. If you have a key in place and
> your agent is set up, there is nothing more to do; it simply works. So
> I believe the only two issues are A) user education; ensuring people
> learn how to generate and protect their keys. But this is something
> all professional software developers _should_ already know how to do.
> Admittedly crypto literacy is not high among developers, but it needs
> to be regardless of Leiningen--it's not OK for professionals to remain
> ignorant of it. Perhaps practical objections ("people won't bother")
> can be raised here, but I don't buy the "it's too hard" argument.

Is installing GPG and learning it's CLI going to be a prerequisite to
using Leiningen? I argue that "crypto literacy" and knowing how to use
a particular tool are different things.

Linux users will generally already have it installed due to yum, apt and
friends being dependent on it. OS X and Windows users generally won't.
Even if they do have it installed I suspect there's a good chance it's
not going to be in $PATH (needs verification).

One of the reasons for the lein-clojars plugin is that Windows
doesn't ship an SSH client by default and SSH is not something most
Windows developers have any reason to be familiar with. Forcing Windows
developers to install and learn OpenSSH or PuTTY, which for them is an
obscure tool they may well never need otherwise is a bad user
experience.

Generally developers even if they don't use SSH understand the concept
of a public/private keypair used for authentication. They don't need to
learn the protocol details before getting started using it. They *do*
need to understand what lein-clojars "keygen" command is doing at a
conceptual level.

So maybe it's desirable to have a similar plugin that causes Lein to use
a copy of the Bouncy Castle PGP library bundled with the plugin if GPG
isn't installed?

> B) is stickier; apparently gpg-agent is not well-integrated with
> certain operating systems. On Gnome it will prompt you for your
> passphrase if the agent doesn't have it cached, but apparently Macs
> and Windows don't work that way out of the box. It looks like this can
> be addressed with http://www.funtoo.org/wiki/Keychain, but I haven't
> looked into it and the need for further third-party software to
> address OS shortcomings is unfortunate. Perhaps deploying is uncommon
> enough that entering the passphrase every time is OK.

Yes, I think it's OK. ... although maybe it'd be good not to require
prompting for the passphrase twice per deploy (repo credentials *and*
signing key) if possible. I guess that's probably difficult though.

If we're going to require signing anyway, this did get me stumbling
across: http://gpgauth.org/

> Ideally I would like each project to be able to specify who they
> trust. There would probably be some keys trusted out of the box (you
> have to trust Leiningen itself, for instance) and the keys trusted by
> the authors of Leiningen itself seem like a good bet. But then
> projects should be able to specify how many steps away from a trusted
> source are still considered trusted (7 degrees of Kevin Bacon, etc)
> and add and revoke additional keys. Another setting would be to
> require more overlapping signatures for a key before it's considered
> trusted. I'd imagine corporate projects should want stricter control
> than open source community ones.

That took me a while to understand -- at first I was trying to figure
out why a project's trust paths might ever differ from its dependency
path. I wonder if that means I'm one of your "crypto illiterate". :)

(I've used GPG a little. I've used/admined SSH, SSL, LUKS and friends a
lot. I'm familiar conceptually with asymmetric vs symmetric encryption,
signatures, MACs, the web of trust, various ciphers and hashes etc.
Never had to cause to use a PGP-style web of trust model in practice
though so far.)

I think I'm finally seeing where you're heading with this and it
actually sounds quite exciting *if*:

* it can be pulled off without nagging people to the point where they
blindly trust everything just to get the warnings out of their face. I
agree that having good out of the box trust is absolutely crucial here.

* we can manage to get strong adoption which I think means tearing down
barriers like people having to stuff around installing GPG and agents.
That doesn't mean dumbing things down so that people don't have to
understand the concepts, it means making them convenient, especially
making them more convenient that any alternatives.

> So far everything I've read about GPG has been very user-centric
> rather than project-centric, but I think this should be doable. This
> is on the consuming side.

One interesting part is how a project with multiple people with release
access would work. Share the project's private key between them?

> On the publishing side, one way would be to only accept deploys signed
> by the original author or a key directly trusted by the author. Key
> rotation makes this interesting since the same author doesn't have the
> same key forever; they will definitely need to delegate this
> publishing trust in a way that we can distinguish from regular trust.

As well as losing keys, which is going to happen. For that reason alone
I'd be inclined to treat them much like how we do the SSH keys.

Jack Moffitt

unread,

Jun 18, 2012, 11:55:12 AM6/18/12

to clojars-m...@googlegroups.com

> * it can be pulled off without nagging people to the point where they
> blindly trust everything just to get the warnings out of their face. I
> agree that having good out of the box trust is absolutely crucial here.

Note that many places just require you to trust the repository, and
then the repository maintainers have revocation on the sub-certs. All
the App Stores work this way, and it seems a good middle ground. If a
bad actor is detected, clojars could pull the cert. It seems a little
more user friendly than having to expose the trust network. For
starters, I might decide I trust various people on the clojure mailing
list who post a lot, but there is no way for me to know that's who
published a jar unless they also sign their emails or I just trust the
author information.

jack.

Phil Hagelberg

unread,

Jun 18, 2012, 2:10:29 PM6/18/12

to clojars-m...@googlegroups.com

On Mon, Jun 18, 2012 at 8:41 AM, Alex Osborne <a...@meshy.org> wrote:
> Is installing GPG and learning it's CLI going to be a prerequisite to
> using Leiningen? I argue that "crypto literacy" and knowing how to use
> a particular tool are different things.

It will only be necessary for certain operations. Right now those are:
0) deploy releases, 1) verify dependencies or 2) read private repository
credentials from disk.

An alternative for 2 would be to tie into OS level secret storage. I
don't think it helps with 0 and 1, so I'd rather get the portable
implementation right before worrying about this personally, but anyone
interested in this feature could certainly look into it. I recall there
being issues with granularity here on the OS X side: at least with Ruby
it doesn't distinguish between granting access to one ruby program vs
another. But that's such a huge leap over storing credentials in
plaintext that it probably shouldn't be a blocker.

> Generally developers even if they don't use SSH understand the concept
> of a public/private keypair used for authentication. They don't need to
> learn the protocol details before getting started using it. They *do*
> need to understand what lein-clojars "keygen" command is doing at a
> conceptual level.
>
> So maybe it's desirable to have a similar plugin that causes Lein to use
> a copy of the Bouncy Castle PGP library bundled with the plugin if GPG
> isn't installed?

Interesting idea. I looked into Bouncy Castle and it claimed to
implement signing and verification but not in a way that integrated with
the agent, so I didn't look much further because always getting prompted
is a lousy experience. But it could serve as a fallback considering we
can't really rely on the agent consistently anyway.

> If we're going to require signing anyway, this did get me stumbling
> across: http://gpgauth.org/

Wow, looks interesting. I've wished for ages to be able to authenticate
in the browser with my SSH pubkey, but this would do the trick too.

> (I've used GPG a little. I've used/admined SSH, SSL, LUKS and friends a
> lot. I'm familiar conceptually with asymmetric vs symmetric encryption,
> signatures, MACs, the web of trust, various ciphers and hashes etc.
> Never had to cause to use a PGP-style web of trust model in practice
> though so far.)

Right; as far as I know the only systems that get this right are apt and
yum, and they have an arguably centralized model of trust. Maven is the
closest anyone's gotten outside an OS, but I think they're just too vast
of a community to pull it off; it's difficult to build a web of trust
when the requirement for signing was added so late.

I think the Clojure community has a chance to do better because we have
a huge head start by building on Maven, but we're still small enough
that most library authors know each other and even meet face-to-face
from time to time. But it is attempting to do something that hasn't been
done before, so there's a quixotic element to the whole plan.

> I think I'm finally seeing where you're heading with this and it
> actually sounds quite exciting *if*:
>
> * it can be pulled off without nagging people to the point where they
> blindly trust everything just to get the warnings out of their face. I
> agree that having good out of the box trust is absolutely crucial here.

This will require having unsigned dependencies be a rarity. It will
depend on getting the community on board. I'm thinking of submitting a
talk to the Conj on the topic; we'll see how that goes. But it
definitely needs the experience to be as seamless as possible.

> * we can manage to get strong adoption which I think means tearing down
> barriers like people having to stuff around installing GPG and agents.
> That doesn't mean dumbing things down so that people don't have to
> understand the concepts, it means making them convenient, especially
> making them more convenient that any alternatives.

Yeah, I think I was a little hasty with my first experiments; I was able
to get a very smooth out-of-the-box experience on Gnome, but I didn't
realize that we can't count on that level of seamless integration on
Macs. I suppose it might really come down to how good the Bouncy Castle
implementation is.

>> So far everything I've read about GPG has been very user-centric
>> rather than project-centric, but I think this should be doable. This
>> is on the consuming side.
>
> One interesting part is how a project with multiple people with release
> access would work. Share the project's private key between them?

I think so, but I haven't thought through the implications all the way.

>> On the publishing side, one way would be to only accept deploys signed
>> by the original author or a key directly trusted by the author. Key
>> rotation makes this interesting since the same author doesn't have the
>> same key forever; they will definitely need to delegate this
>> publishing trust in a way that we can distinguish from regular trust.
>
> As well as losing keys, which is going to happen. For that reason alone
> I'd be inclined to treat them much like how we do the SSH keys.

True. That means initially an attacker getting access to Clojars
credentials would allow them to publish trusted artifacts, but as soon
as it was detected the key could be revoked. What GitHub does is send
out emails whenever a new SSH pubkey is added to your account; we could
probably do the same to detect intrusions more quickly.

-Phil

Alex Osborne

unread,

Jun 18, 2012, 6:27:41 PM6/18/12

to clojars-m...@googlegroups.com

On 06/19/2012 04:10 AM, Phil Hagelberg wrote:

> Interesting idea. I looked into Bouncy Castle and it claimed to
> implement signing and verification but not in a way that integrated with
> the agent, so I didn't look much further because always getting prompted
> is a lousy experience. But it could serve as a fallback considering we
> can't really rely on the agent consistently anyway.

I did come across this earlier:

http://codenav.org/code.html?project=/org/sonatype/oss/oss-parent/7&path=/Source%20Packages/org.kohsuke.maven.pgp.loaders/GpgAgentPassPhraseLoader.java

Not sure how much point there'd be in using that though: if there's an
agent running, you're going to have gpg installed anyway.

>> As well as losing keys, which is going to happen. For that reason alone
>> I'd be inclined to treat them much like how we do the SSH keys.
>
> True. That means initially an attacker getting access to Clojars
> credentials would allow them to publish trusted artifacts, but as soon
> as it was detected the key could be revoked. What GitHub does is send
> out emails whenever a new SSH pubkey is added to your account; we could
> probably do the same to detect intrusions more quickly.

It would allow the attacker to publish artifacts, but not trusted ones
though unless there's a trust path to the attacker's key? Isn't the
whole point of signing exercise that if Clojars is compromised or MITM'd
all is not lost? :)

Phil Hagelberg

unread,

Jun 18, 2012, 7:35:43 PM6/18/12

to clojars-m...@googlegroups.com

On Mon, Jun 18, 2012 at 3:27 PM, Alex Osborne <a...@meshy.org> wrote:
> http://codenav.org/code.html?project=/org/sonatype/oss/oss-parent/7&path=/Source%20Packages/org.kohsuke.maven.pgp.loaders/GpgAgentPassPhraseLoader.java
>
> Not sure how much point there'd be in using that though: if there's an agent
> running, you're going to have gpg installed anyway.

Hm; well it might let us simplify and only have a single code path vs
shelling out when we can and falling back to in-process calls
otherwise.

>> True. That means initially an attacker getting access to Clojars
>> credentials would allow them to publish trusted artifacts, but as soon
>> as it was detected the key could be revoked. What GitHub does is send
>> out emails whenever a new SSH pubkey is added to your account; we could
>> probably do the same to detect intrusions more quickly.
>
>
> It would allow the attacker to publish artifacts, but not trusted ones
> though unless there's a trust path to the attacker's key? Isn't the whole
> point of signing exercise that if Clojars is compromised or MITM'd all is
> not lost? :)

Quite right; sloppy thinking on my part.

-Phil

Phil Hagelberg

unread,

Jun 18, 2012, 2:30:30 PM6/18/12

to clojars-m...@googlegroups.com

On Mon, Jun 18, 2012 at 8:55 AM, Jack Moffitt <ja...@metajack.im> wrote:
> Note that many places just require you to trust the repository, and
> then the repository maintainers have revocation on the sub-certs. All
> the App Stores work this way, and it seems a good middle ground. If a
> bad actor is detected, clojars could pull the cert. It seems a little
> more user friendly than having to expose the trust network. For
> starters, I might decide I trust various people on the clojure mailing
> list who post a lot, but there is no way for me to know that's who
> published a jar unless they also sign their emails or I just trust the
> author information.

Yeah, this is common among centralized schemes, but I don't want to
get into the business of policing content. We don't have anyone
working on this full-time, and as soon as we get into that game I
think people are going to start expecting a more active role from us.
Also with the current proposal you don't even have to trust Clojars;
if Clojars were compromised it wouldn't be possible to attack user
builds.

-Phil

Phil Hagelberg

unread,

Jul 3, 2012, 1:34:48 AM7/3/12

to clojars-m...@googlegroups.com

I've sketched out an implementation of artifact promotion here:

https://github.com/technomancy/clojars-web/compare/promote

Every file that gets deployed gets placed on an in-memory
LinkedBlockingQueue, checked for blockers, and promoted to S3 if none
are found.

A few TODOs and open questions:

* Checks for signatures are unimplemented; should be easy to take code
from Leiningen for this unless we want to investigate bouncycastle.

* I think we'd want to set up periodic re-processing of unpromoted
artifacts in order to catch ones that could fall through the cracks,
for example from having the JVM restarted before the promotion queue
is empty. I think this lets us off the hook for having stateful
in-memory queues, but we'll have to see how practical it is in
practice. I was thinking of exposing a web endpoint to initiate queueing
up all unpromoted non-snapshots and hitting it via curl from a cron job.

* We need to expose the list of blockers for a given deployment's
promotion in the web interface, probably only to owners of a given
jar.

* I'm not sure the best way to store AWS credentials during development.
We don't want them checked into dev-resources/config.clj, so I've added
"local-resources" to :resource-paths in the dev profile.

* Do we have a name for the list of all artifacts involved in a single
deployment? I think the term "jars" is used for that in the codebase
currently, but it's not quite the right word.

Of course everything here is still open for discussion, but I thought
having some code to discuss might be helpful. I think the signature
checks are really the only thing preventing this from being usable as a
proof-of-concept.

-Phil

Hugo Duncan

unread,

Jul 3, 2012, 11:05:50 AM7/3/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> * I think we'd want to set up periodic re-processing of unpromoted
> artifacts in order to catch ones that could fall through the cracks,
> for example from having the JVM restarted before the promotion queue
> is empty. I think this lets us off the hook for having stateful
> in-memory queues, but we'll have to see how practical it is in
> practice. I was thinking of exposing a web endpoint to initiate queueing
> up all unpromoted non-snapshots and hitting it via curl from a cron job.

Would the set of unpromoted artifacts be bounded, or will this grow over time?

> * I'm not sure the best way to store AWS credentials during development.
> We don't want them checked into dev-resources/config.clj, so I've added
> "local-resources" to :resource-paths in the dev profile.

Having the credentials in a configuration file would seem safer to me,
so you don't have jars on disc containing your credentials. I guess the
"local-resources" directory could just be excluded from the jar.

How would this work in production - would you start the process with an
extra classpath argument pointing to the location of the production
credentials?

> * Do we have a name for the list of all artifacts involved in a single
> deployment? I think the term "jars" is used for that in the codebase
> currently, but it's not quite the right word.

release artifacts?

Hugo

Phil Hagelberg

unread,

Jul 3, 2012, 1:17:42 PM7/3/12

to clojars-m...@googlegroups.com

In my last mail I said signatures were the one thing blocking using it
as a proof-of-concept, but flipping a "promoted" flag and having that
cause the snapshots repo to refuse overwriting deploys also needs
to be implemented.

Hugo Duncan <hu...@hugoduncan.org> writes:

> Would the set of unpromoted artifacts be bounded, or will this grow over time?

Maybe we could have the hourly cron job only act upon the last 24 hours
or so worth of artifacts, and if that turns out to be insufficient we
could add a weekly one for the entire set. If it takes a while it would
be easy to have a foreground and background queue so direct handling of
uploads wouldn't get clogged up with the background processing.

> Having the credentials in a configuration file would seem safer to me,
> so you don't have jars on disc containing your credentials. I guess the
> "local-resources" directory could just be excluded from the jar.

Yeah, keeping it in the :dev profile takes care of keeping it out of jars.

> How would this work in production - would you start the process with an
> extra classpath argument pointing to the location of the production
> credentials?

We already have a production-specific config file, so it's just a matter
of adding the credentials there.

>> * Do we have a name for the list of all artifacts involved in a single
>> deployment? I think the term "jars" is used for that in the codebase
>> currently, but it's not quite the right word.
>
> release artifacts?

Hm; so the abstract grouping of all artifacts with a given
group/artifact/version is a "release"? But that implies non-snapshots,
which isn't always the case. Maybe a "deployment"?

-Phil

Phil Hagelberg

unread,

Jul 4, 2012, 12:16:27 AM7/4/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> I've sketched out an implementation of artifact promotion here:
>
> https://github.com/technomancy/clojars-web/compare/promote

I just pushed signature checks and a promoted_at flag, so this could be
considered a working minimum feature set for a releases repository. It's
only lightly tested, but I would love to get some review on it and hear
peoples' thoughts.

Of particular interest is the bare-bones migration namespace I had to
add to get the promoted_at flag added to the jars table:

https://github.com/technomancy/clojars-web/blob/promote/src/clojars/db/migrate.clj

I know there are more sophisticated migration libraries for Clojure, but
in my opinion they do too much; I believe the notion of application
portability between databases to be a pleasant fiction that is not
practical in real-world applications. Arguably what I've got here is
oversimplified; for instance it does not support reversal of migrations,
but in my experience such features also don't work out in practice very
well since they never get tested, and reversing schema changes almost
always needs close human attention anyway. But I'm open to alternate
suggestions here.

Background jobs to catch stragglers as well as exposing the list of
blockers in the web UI remains to be implemented.

Thoughts?

-Phil

Alex Osborne

unread,

Jul 5, 2012, 8:24:12 PM7/5/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> I've sketched out an implementation of artifact promotion here:
>
> https://github.com/technomancy/clojars-web/compare/promote
>
> Every file that gets deployed gets placed on an in-memory
> LinkedBlockingQueue, checked for blockers, and promoted to S3 if none
> are found.

Cool, looks good.

What happens in the failure scenario (eg S3 is unavailable)? I guess
the transaction's rolled back, an exception's propagated and the worker
thread dies?

Also, maybe I'm misreading but it looks like the worker thread currently
only ever promotes one upload and then exits?

(defn start []
(.start (Thread. #(promote (.take queue)))))

> A few TODOs and open questions:
>
> * Checks for signatures are unimplemented; should be easy to take code
> from Leiningen for this unless we want to investigate bouncycastle.
>
> * I think we'd want to set up periodic re-processing of unpromoted
> artifacts in order to catch ones that could fall through the cracks,
> for example from having the JVM restarted before the promotion queue
> is empty. I think this lets us off the hook for having stateful
> in-memory queues, but we'll have to see how practical it is in
> practice. I was thinking of exposing a web endpoint to initiate queueing
> up all unpromoted non-snapshots and hitting it via curl from a cron job.

Sounds fine to me. I guess alternatively it could be a timer thread but
I generally like the cron approach as:

* it makes it easy to trigger manually

* it's easy to turn off temporarily which might come in handy at some
point

* it plays reasonably well with the webapp failover. If the primary
webapp instance gets stuck the secondary will start queuing things.

> * Do we have a name for the list of all artifacts involved in a single
> deployment? I think the term "jars" is used for that in the codebase
> currently, but it's not quite the right word.

I tend to think of them as a "push" or "upload" but that may not be the
best terminology either.

Alex Osborne

unread,

Jul 5, 2012, 8:40:31 PM7/5/12

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> Of particular interest is the bare-bones migration namespace I had to
> add to get the promoted_at flag added to the jars table:
>
> https://github.com/technomancy/clojars-web/blob/promote/src/clojars/db/migrate.clj
>
> I know there are more sophisticated migration libraries for Clojure, but
> in my opinion they do too much; I believe the notion of application
> portability between databases to be a pleasant fiction that is not
> practical in real-world applications.
>
> Arguably what I've got here is
> oversimplified; for instance it does not support reversal of migrations,
> but in my experience such features also don't work out in practice very
> well since they never get tested, and reversing schema changes almost
> always needs close human attention anyway. But I'm open to alternate
> suggestions here.

+1

I like the straightforward approach you've taken with the migrations.

I've never quite gotten the point of migration libraries outside of the
case where you're trying to target more than one database and thus need
a thick SQL translation layer.

There are reasons for attempting db portability (eg you're trying to
create a product that integrates well with your customer's existing
infrastructure) but we've got no reason to do that and would just be
making things harder.

Phil Hagelberg

unread,

Jul 8, 2012, 12:09:35 AM7/8/12

to clojars-m...@googlegroups.com

On Thu, Jul 5, 2012 at 5:24 PM, Alex Osborne <a...@meshy.org> wrote:
> What happens in the failure scenario (eg S3 is unavailable)? I guess
> the transaction's rolled back, an exception's propagated and the worker
> thread dies?
>
> Also, maybe I'm misreading but it looks like the worker thread currently
> only ever promotes one upload and then exits?
>
> (defn start []
> (.start (Thread. #(promote (.take queue)))))

Quite right; how silly of me. Good catch. It now logs exceptions to
stdout and continues. Is stdout appropriate for this kind of thing?

> Sounds fine to me. I guess alternatively it could be a timer thread but
> I generally like the cron approach as:
>

> * it plays reasonably well with the webapp failover. If the primary
> webapp instance gets stuck the secondary will start queuing things.

Yes, in a sense the load balancer acts as a poor man's leadership
election algorithm.

>> * Do we have a name for the list of all artifacts involved in a single
>> deployment? I think the term "jars" is used for that in the codebase
>> currently, but it's not quite the right word.
>
> I tend to think of them as a "push" or "upload" but that may not be the
> best terminology either.

"A deployment of artifacts" perhaps? It's not any weirder than "a
murder of crows".

> I've never quite gotten the point of migration libraries outside of the
> case where you're trying to target more than one database and thus need
> a thick SQL translation layer.

I think the idea is that different schemas apply on different
branches, so having a way to say "bring me up to the latest" is
important simply to support distributed development. For most
databases I suppose down migrations are more important because you'd
want to be able to go back to development on master once you're done
trying out my changes on the "promote" branch, but since it's easy to
simply make a copy of master's dev_db and bring that back into place
once you're back on master it's less of an issue with SQLite.

But most migration frameworks complect "tracking a series of
operations on schema and data" with "allowing schema changes to be
specified in code portably across all backends", the latter of which
is basically impossible to implement in a way that covers all the edge
cases.

-Phil

Reply all

Reply to author

Forward