Fixing bugs, adding features

103 views

Skip to first unread message

Phil Hagelberg

unread,

Mar 23, 2011, 11:56:47 PM3/23/11

to clojars-m...@googlegroups.com

We had some folks pipe up in the #clojure channel today that were
interested in helping out with the Clojars codebase. It would be great
to come up with a list of things that could get hacked on; maybe a
plan for attacking the higher-priority issues. I'd like to just get
the ball rolling for that.

My top hits would be:

#1: take advantage of pom.xml inside jars if present
#23: use lucene for searching (could steal code from lein-search for this)
#24: keep snapshots and releases in separate repositories
#2: browse interface
#5: display dependencies (and possibly project.clj) on show page

But of course if there's something specific that catches peoples' eye,
then who am I to tell you what to do? =) I may factor out the lucene
code from lein-search into its own project.

Of course, writing new code is only helpful if it can be deployed. I
remember talk of migrating off sqlite. As of the last discussion on
this list that was written but not deployed. But the code in
/home/clojars/prod seems to be up to date with the latest master
branch, so is it currently running against couch?

If development were to proceed, should it happen from master? There's
also the question of deploying in general--what's the process for
that? How are the processes daemonized? Hugo mentioned being willing
to help automate the deployment using Pallet--that would certainly
make it easier for people to test out their changes by deploying to a
local virtualbox.

If this gets cleaned up and documented, it might even make a good
resource for documenting how to deploy and run a Clojure webapp in
general, since that's something that seems to be not very
well-understood in general.

cheers,
Phil

Alex Osborne

unread,

Mar 24, 2011, 2:10:02 AM3/24/11

to clojars-m...@googlegroups.com

Phil Hagelberg <ph...@hagelb.org> writes:

> We had some folks pipe up in the #clojure channel today that were
> interested in helping out with the Clojars codebase. It would be great
> to come up with a list of things that could get hacked on; maybe a
> plan for attacking the higher-priority issues. I'd like to just get
> the ball rolling for that.
>
> My top hits would be:
>
> #1: take advantage of pom.xml inside jars if present
> #23: use lucene for searching (could steal code from lein-search for this)
> #24: keep snapshots and releases in separate repositories
> #2: browse interface
> #5: display dependencies (and possibly project.clj) on show page

All good ideas. A couple more:

* Something better for the account stuff. A "password reset" email
thing would be a good start. (Or maybe even OpenID.)

* Some sort of integration with one or more of the various documentation
sites that have sprung up.

> But of course if there's something specific that catches peoples' eye,
> then who am I to tell you what to do? =) I may factor out the lucene
> code from lein-search into its own project.
>
> Of course, writing new code is only helpful if it can be deployed. I
> remember talk of migrating off sqlite. As of the last discussion on
> this list that was written but not deployed. But the code in
> /home/clojars/prod seems to be up to date with the latest master
> branch, so is it currently running against couch?

I ditched the couchdb attempt. It was taking too long, added lots of
dependencies (Erlang etc) and stuff that needed configuring for little
benefit. I was also having problems with couchdb crashing, although
that's no doubt been fixed by now.

The "prod" branch from my github is what's in production, it should be
identical to "master" most of the time.

> If development were to proceed, should it happen from master?

Correct.

> There's also the question of deploying in general--what's the process
> for that? How are the processes daemonized?

There's two identical instances of the application running "clojars" and
"clojars-backup". They're daemonized by Upstart (Ubuntu's replacement
for /etc/init.d) and set to kill -9 suicide (and thus be respawned by
Usptart) on Java out of memory errors (which never really happens, it's
just a safety habit).

The app runs with an embedded Jetty from the uberjar generated by lein.
The command looks kind of hideous but it's really just java -jar with a
bunch of extra logging options enabled.

$ cat /etc/init/clojars.conf
description "Clojars webapp (production)"

respawn
start on filesystem
stop on shutdown

chdir /home/clojars/prod
exec su clojars -c 'java -Dnla.node=clojars -Xmx32m -server "-XX:OnOutOfMemoryError=kill -9 %p" -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCDateStamps -jar current-standalone.jar 8001 7601 2>&1 | /usr/bin/cronolog -S /logs/clojars.log /logs/%Y%m/clojars.%Y-%m-%d.log'

The stdout/err output is piped into cronolog for log rotation.

The two port numbers passed to -main are the web port (8001) and the
nailgun port for the SSH integration (7601). The backup instance uses
8002 and 7602.

You can stop and start clojars just like any other service on Ubuntu:

sudo stop clojars
sudo start clojars
sudo restart clojars
sudo status clojars

In normal operation clojars-backup is not hit at all. It's mainly there
as a safety measure in case the main app hangs, so that you can do a
sanity check when deploying before going live and so that you can do an
outage-free deploy (although logged in users will lose their sessions as
they're not stored in the DB currently).

The process of deploying is just to pull from git, re-uberjar and then
restart clojars-backup. If it looks OK, then restart the primary
instance. I just use this shell script:

ato@clojars:~/bin$ cat deploy-clojars
#!/bin/bash
set -e

cd /home/clojars/prod
sudo -H -u clojars git pull
sudo -H -u clojars ~clojars/bin/lein uberjar
sudo restart clojars-backup

echo "Test changes at: http://clojars.org:8002/"
echo "If ok run: sudo restart clojars"

The web site is fronted by nginx which handles the failover between the
two instances and the serving of static content and repository itself.
This means that even if the webapp is down people can still download
stuff from the repository.

$ cat /etc/nginx/sites-available/clojars
upstream clojars-web {
server localhost:8001 max_fails=3;
server localhost:8002 max_fails=3 backup;
}

server {
listen 80;
server_name clojars.org;
root /home/clojars/prod/public;
access_log /var/log/nginx/clojars.access.log;

location / {
# try static content first, then fall through to the webapp
try_files $uri @clojars_webapp;
}

location /repo {
root /home/clojars;
autoindex on;
}

location @clojars_webapp {
proxy_pass http://clojars-web;
}

##
## Linked repositories
##

location /repo/org/clojure {
rewrite ^/repo/(.*)$ http://build.clojure.org/releases/$1 permanent;
if ($uri ~ ".*-SNAPSHOT/.*") {
rewrite ^/repo/(.*)$ http://build.clojure.org/snapshots/$1 permanent;
}
}

location /repo/org/xerial {
rewrite ^/repo/(.*)$ http://www.xerial.org/maven/repository/artifact/$1 permanent;
}
}

Finally that leaves the nailgun/scp socket. For failover of that I use
a deliciously simple TCP load balancer called 'balance'.

http://www.inlab.de/balance.html

Again that just runs out of Upstart:

$ cat /etc/init/clojars-scp-balance.conf
description "Clojars scp balancer (production)"
respawn
start on filesystem
stop on shutdown
chdir /home/clojars
exec balance -b 127.0.0.1 8700 localhost:7601 ! localhost:7602

The generated authorized_keys files for the clojars user points the
nailgun client at port 8700.

/etc/ssh/sshd_config turns off password prompts for the clojars user.

Match User clojars,root
PasswordAuthentication no

The Lucene indexing stuff is Sonatype's CLI nexus-indexer. I just run
it out of cron:

# crontab -u clojars -l
# m h dom mon dow command
*/15 * * * * java -jar ~/indexer/nexus-indexer-2.0.4-cli.jar -n clojars -i ~/indexer/index -d ~/repo/.index -r ~/repo -s -q -t min -l

Documented here:

https://docs.sonatype.org/display/M2ECLIPSE/Nexus+Indexer
http://www.sonatype.com/people/2009/06/nexus-indexer-api-part-1/

> Hugo mentioned being willing to help automate the deployment using
> Pallet--that would certainly make it easier for people to test out
> their changes by deploying to a local virtualbox.

Pallet is essentially something like Chef/Puppet right? Mmm. I didn't
originally see the point in Pallet for Clojars. It's not like we're
ever going to need to spin up multiple servers for load reasons.

However your use case does make a lot of sense. Saves messing with the
SSH and nginx config on your development computer. The setup procedure
is going to be something like this:

aptitude install openjdk-6-jdk nginx balance sqlite3 cronolog nailgun

# install leinigen

adduser clojars

cd /home/clojars
mkdir -p data repo .ssh

git clone https://github.com/ato/clojars-web.git prod
cd prod
lein uberjar

ln -s ../data/auth_keys /home/clojars/.ssh/authorized_keys
sqlite3 /home/clojars/data/db < clojars.sql

Then chuck in the nginx, cron, SSH and upstart config I mentioned
above.

> If this gets cleaned up and documented, it might even make a good
> resource for documenting how to deploy and run a Clojure webapp in
> general, since that's something that seems to be not very
> well-understood in general.

I'm not sure whether this is a good way of deploying a Clojure webapp or
not. I haven't put a huge amount of thought into it. I'd certainly be
interested if anyone's got any comments.

The traditional Java model is with an external servlet container
(Tomcat, Jetty etc) and WAR files. This works quite well in a large
shop with a dedicated ops team, lots of monitoring, custom automation
and such but is not exactly simple or friendly for those without a Java
background.

An uberjar that just calls run-jetty like I do for Clojars is easy for
users, but requires a bit of effort from the developer, particular if
you want to add important config options like the bind address, port,
path and "devel/test/prod" environment settings.

Personally I'd like to see someone do a Ring equivalent of Ruby's
"rackup". Just a simple little thing provides your basic command-line
options, starts an embedded jetty and can be used as a uberjar main
class.

Reply all

Reply to author

Forward

0 new messages