Now that the bcrypt functionality[1] has been deployed, I'd like to
move forward with a further proposal for a revamped Clojars. I'd be
happy to get your opinions on the following.
thanks,
Phil
[1] - https://groups.google.com/group/clojure/browse_thread/thread/5e0d48d2b82df39b
# Proposal for Next-Generation Clojars
Clojars should be a frontend for deploying jars directly to a
repository hosted on S3 for improved redundancy and reliability.
## Separate Repositories
It's desirable to have as few snapshots repositories in your list as
possible in order to speed up dependency resolution. In addition, the
presence of snapshots introduces undesirable semantics for version
ranges. Because of this, the final release of Leiningen 2 will not
have any snapshot repositories in its default list. This means Clojars
will need to serve a separate repository to segregate releases from
snapshots.
Rather than the Maven world's traditional snapshots/releases split,
the original repository will be left alone as an "anything goes" repo
while the new next-generation repository would only accept releases.
## Maintaining Backwards Compatibility
Leiningen 1 will only check the old repository, but a redirect rule
can be added to it so that artifacts that are not found can fall
through to the new repository. Some restrictions may be required here;
for instance artifacts that have had pushes to the new repository
should not accept uploads to the old anymore.
In addition, compatibility needs to go the other way. At the time the
new repository is launched, it should be seeded with all non-snapshot
artifacts from the old that are grandfathered in. The two repositories
should share a single user database and group permissions list.
## Requirements
The new repository will not accept snapshots. We may also want to
impose additional requirements on uploads. One possibility would be to
require signing of jars. We need to do some investigation around how
this can be done with the minimum hassle as well as discussing
further policies that can be built around it.
Another possibility is to require more elements of the pom to be
filled out. In particular, fields of interest include description,
URL, licenses, and SCM availability. Having a richer set of metadata
available could prove valuable down the line.
## Uploads
Uploads to the new repository should be done over HTTP using the
standard Maven deployment mechanism. This will allow greater tooling
interoperability as well as reusing existing code and shelving the
custom SCP uploader. Jars once uploaded should be stored on S3 so that
downtime of the machines hosting the Clojars application will not
affect availability of jars. It will also make setting up a Clojars
instance for development much simpler since everything can be done
in-process.
## Migration
The first step is to migrate the database to PostgreSQL. Then both the
old and new repositories can share a database. The once the HTTP
upload functionality is finished, the new repository should be seeded
with releases from the old, and it can start accepting new uploads.
In the interests of full disclosure I should mention that we are
planning on running the new Clojars repository on Heroku.
-Phil
Rather than the Maven world's traditional snapshots/releases split,
the original repository will be left alone as an "anything goes" repo
while the new next-generation repository would only accept releases.
The new repository will not accept snapshots. We may also want to
impose additional requirements on uploads. One possibility would be to
require signing of jars.
Another possibility is to require more elements of the pom to be
filled out. In particular, fields of interest include description,
URL, licenses, and SCM availability. Having a richer set of metadata
available could prove valuable down the line.
> I like this plan and would suggest that in the new repo, jars with
> duplicate version numbers should not be permitted. For example, this
> directory shows several jars with the same version number:
> http://clojars.org/repo/aleph/core/0.6.0-SNAPSHOT/
This should be taken care of by disallowing snapshots, actually.
> I also experimented and noticed that if I use "lein push" multiple
> times with the same (release) version number then I overwrite a jar
> already on clojars. I think the new Clojars should instead have a
> policy of immutable dependencies, so that users of these libraries
> can be sure the bytecode won't change. Probably requiring
> monotonicity in version numbers is also preferable.
Good idea; this should be explicitly enforced.
> What about linking the jars (in an automated way) to github? If lein/
> clojars could easily match artifacts to a github commit, that would
> perhaps serve as a source of rich metadata.
The poms generated by Leiningen will automatically contain links to the
git repository if it was generated from a project that's a git checkout.
But perhaps we should go the extra measure and require this.
> I was also quite intrigued by the suggestion in Clojurescript One
> that was forked to this plugin:
> https://github.com/tobyhede/lein-git-deps
> The idea that git urls could be a useful way to manage source
> dependencies. I'm naively wondering if it might be good for the new
> Clojars to index dependency releases hosted on github.
Keeping the scm metadata around is a fine idea. The idea of generating a
repo full of jars based on tags found in registered project git repos is
one that's occurred to me, (indeed, this is what
http://melpa.milkbox.net/ does) but I think that would be a different
kind of repository from Clojars.
-Phil
Do separate repos get more then just shorter defaults? Does
{"clojars-releases" {:url http://clojars.org/repo :snapshots false}}
or
<repositories>
<repository>
<id>clojars-releases</id>
<url>http://clojars.org/repo</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
achieve the same things?
> ## Requirements
...
> Another possibility is to require more elements of the pom to be
> filled out. In particular, fields of interest include description,
> URL, licenses, and SCM availability. Having a richer set of metadata
> available could prove valuable down the line.
I'm a fan of using more info from the pom. Maybe not of requiring it.
Looking at a page like http://rubygems.org/gems/rails and being able
to see all the places to find info is nice.
>
> ## Uploads
>
> Uploads to the new repository should be done over HTTP using the
> standard Maven deployment mechanism. This will allow greater tooling
> interoperability as well as reusing existing code and shelving the
> custom SCP uploader. Jars once uploaded should be stored on S3 so that
> downtime of the machines hosting the Clojars application will not
> affect availability of jars. It will also make setting up a Clojars
> instance for development much simpler since everything can be done
> in-process.
+1 for http uploads and s3. The http uploads will make it much easier
to do testing then faking scp.
>
> ## Migration
>
> The first step is to migrate the database to PostgreSQL. Then both the
> old and new repositories can share a database. The once the HTTP
> upload functionality is finished, the new repository should be seeded
> with releases from the old, and it can start accepting new uploads.
What about search? Currently it is some sqlite queries. Are there
plans to continue that in postgres for awhile? Any thoughts on lucene
or since it will be on heroku using
https://addons.heroku.com/searchify ?
I think it would have to live at a different URL. It could be the same
domain though.
> I'm a fan of using more info from the pom. Maybe not of requiring it.
> Looking at a page like http://rubygems.org/gems/rails and being able
> to see all the places to find info is nice.
I think requiring a license and URL is pretty reasonable. Description
helps a lot for search results, but I could budge on it. On the other
hand it doesn't take long to fill out. We should also check for the
FIXME defaults that Leiningen places in new project skeletons and
reject those.
> What about search? Currently it is some sqlite queries. Are there
> plans to continue that in postgres for awhile? Any thoughts on lucene
> or since it will be on heroku using https://addons.heroku.com/searchify ?
I was originally thinking Lucene since we've already got the code for
searching that in lein search, but I think it's up to whomever
implements it. Postgres full-text search would have the advantage of
living in the database, and thus it wouldn't have to be rebuilt when a
new node is spun up. Fewer moving parts is always nice.
-Phil
> What about search? Currently it is some sqlite queries. Are thereI was originally thinking Lucene since we've already got the code for
> plans to continue that in postgres for awhile? Any thoughts on lucene
> or since it will be on heroku using https://addons.heroku.com/searchify ?
searching that in lein search, but I think it's up to whomever
implements it. Postgres full-text search would have the advantage of
living in the database, and thus it wouldn't have to be rebuilt when a
new node is spun up. Fewer moving parts is always nice.