Fixing production systems on-the-fly

9 views
Skip to first unread message

Krukow

unread,
Sep 4, 2009, 4:22:05 AM9/4/09
to Clojure
I was thinking about the capability of changing production systems on
the fly. E.g. by having an accessible repl in a running production
system.

If you have a bug in a function, you can fix it by re-def'ing it -
that is great. However, suppose you want to do a system upgrade where
you want to change several things. Now you could just re-def each var
one at a time, but this might produce an inconsistent program in the
interval where you have re-def'ed some but not all vars.

This first thing you would want is sort-of a atomic update of all
vars, similarly to what is possible with refs. Is this possible
somehow? If not are there any techniques or best practices for these
"system upgrades"?


/Karl

Christophe Grand

unread,
Sep 4, 2009, 7:38:22 AM9/4/09
to clo...@googlegroups.com
You have to be prepared to deal with potential inconsistencies: a
closure (or any object) can hold a reference to the value of a
function.

(defn foo [x] (str x "v1"))
(def s (map foo [:a :b :c]))
(defn foo [x] (str x "v2"))
s ; (":av1" ":bv1" ":cv1")

Christophe
--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)

Jarkko Oranen

unread,
Sep 4, 2009, 7:45:31 AM9/4/09
to Clojure
Hm, I don't think it's possible to have transactional updates
involving multiple Vars, but you can always keep functions in Refs:
(def foo (ref (fn foo [x] x)))

then, because IRefs forward calls to their values, you can just do
(foo 1) and it works. :)
When you want to update multiple refs, all you should need to do is
(dosync (ref-set foo newfn) (ref-set another-foo anothernewfn)) ...

This approach requires a bit of manual work and discipline, but
probably is worth investigating

--
Jarkko

Laurent PETIT

unread,
Sep 4, 2009, 7:56:37 AM9/4/09
to clo...@googlegroups.com
2009/9/4 Jarkko Oranen <chou...@gmail.com>


And still, if on the calling side you need to apply a consistent state of several functions, you'll need to also make these calls from within a dosync too ...

Sean Devlin

unread,
Sep 4, 2009, 8:41:24 AM9/4/09
to Clojure
Don't forget about classic stuff, like rigorous testing. I've tried
updating live production systems with Rails before, and it's burned me
(Think bringing down the entire production website). Small changes
you "are sure will work" often have subtle implications in different
environments. A few things I've been burned by before:

* Is your fix the right thing? We all make mistakes.
* Is the database in a similar state? Production might have nulls/
other weird values.
* Is the filesystem in a similar state? This is important if your
system depends on user files (like flickr).
* Is the JVM the same version?
* Are all the dependent .JARs the same revision?
* Are there differences between the development & production OS (not
as a big deal w/ Java, but still worth considering)
* Are the configuration options on the production servers that are
different.
* Will your application source code be in the same state as the
production system? What happens when you restart? This is annoying
enough in development, let alone production.
* Will you lose the history of he bug? Since Clojure is functional,
it's really easy to recreate a unit test of the failure condition.
Make sure to capture the existence of the bug.
* Will your fix scale? Admittedly, this is harder to test than most
other things. I know a lot my code is O(n) or O(n^2) first time
around, and I could easily speed it up an order of magnitude just by
indexing

So I guess my final response to deploying untested changes to a
production system is "Don't do it".

My $.02
Sean

Krukow

unread,
Sep 4, 2009, 8:59:53 AM9/4/09
to Clojure
On Sep 4, 1:38 pm, Christophe Grand <christo...@cgrand.net> wrote:
> You have to be prepared to deal with potential inconsistencies: a
> closure (or any object) can hold a reference to the value of a
> function.

OK - I realized this already for running threads, e.g., executing a
function where a var means one thing the first time around and another
the second. But your example points out that laziness and closures
makes this even worse :-(

I guess the conclusion is that you can't really fix live production
systems without some downtime without carefully designing the system
for this upfront (following certain conventions), i.e., you don't get
this *for free* by using a LISP (of course the development aspects of
change on-the-fly are still good).

Now I am a complete Erlang novice, but I think OTP has (some kind of)
built-in support for system upgrades. It might be worth checking out
to see if there is something we can adapt to Clojure?

/Karl

Chas Emerick

unread,
Sep 4, 2009, 9:14:22 AM9/4/09
to clo...@googlegroups.com
On Sep 4, 1:38 pm, Christophe Grand <christo...@cgrand.net> wrote:
> You have to be prepared to deal with potential inconsistencies: a
> closure (or any object) can hold a reference to the value of a
> function.

In some circumstances, I am careful to pass vars rather than fns, if I
know they are going to be held. That makes things easier during
development, and when I want to roll in updates in production.

- Chas

Stuart Sierra

unread,
Sep 4, 2009, 9:44:03 AM9/4/09
to Clojure
On Sep 4, 4:22 am, Krukow <karl.kru...@gmail.com> wrote:
> I was thinking about the capability of changing production systems on
> the fly. E.g. by having an accessible repl in a running production
> system.

This is a popular list question. The short answer is "no." It might
work for correcting a single, isolated function. But in general,
there are too many potential interactions to do this safely.

If you really want system-wide, real-time code upgrades, then Erlang
is your friend, but that's a radically different virtual machine
model, not easily replicated. Java servlet containers like Tomcat
support "hot deploy" to a limited extent, but it's tricky to use.

-SS

tmountain

unread,
Sep 4, 2009, 10:05:50 AM9/4/09
to Clojure
Erlang allows two versions of a module to be stored in memory at any
given time. This allows you to do hot code swapping at runtime without
taking down the running server. Clojure can obviously do the same
thing, but Erlang offers a convenient builtin mechanism for shelling
into the running Erlang VM.

I've been thinking about how this would be accomplished in Clojure,
and it seems you could simply have a thread listening for a connection
and then write a small function to inject the new code across the
listening socket. This way, you could fire up your REPL, load your
library, and then just call your hot-swap function to send a new
callback to the running server without any additional dependencies
such as Nailgun (which I wouldn't use in production anyway).

-Travis

Richard Newman

unread,
Sep 4, 2009, 12:07:05 PM9/4/09
to clo...@googlegroups.com
> Now I am a complete Erlang novice, but I think OTP has (some kind of)
> built-in support for system upgrades. It might be worth checking out
> to see if there is something we can adapt to Clojure?


There are several tiers of reliability that determine what kinds of
fixes/upgrades you can do.

At low levels (2-3 nines) you can run with one machine and fix it as
you go.

At 4-5 nines, you use redundant machines (and georedundancy), and
having a stable machine image is important for hardware swaps. You
never fix on the server unless the alternative is dropping calls, and
the customer is on the phone *right now*. (That's the worst
environment to figure out a fix, of course.)

Above that (6 nines) is the realm of ATC and telephony, where you
might not be able to afford to bring a system down at all. Erlang/OTP
was designed for this space, so it includes hot-swappable components,
though you test and verify them beforehand!

Re consistency: I seem to recall Pascal Costanza working on activation
of layers, so you can swap a whole set of stuff across your program.
He spoke about it at ILC2009, but I'm not sure I've found the right
paper. Common Lisp-specific, though.

-R

tmountain

unread,
Sep 4, 2009, 2:30:42 PM9/4/09
to Clojure
I just put together some example code to demonstrate "hot updates"
with Clojure.

http://paste.lisp.org/display/86576

It allows you to connect to a REPL via port 12345 and dynamically
update things as necessary. To address the issue of updating multiple
definitions at once, you'd do something like the following (after
modifying main.clj):

travis@travis-desktop:~$ nc localhost 12345
clojure.core=> (require 'main :reload)

Right now the main thread simply prints message in a loop, but I've
tried changing main.clj to modify both the print-hello function and
value of message, and it worked great. You can also connect to the
repl and do something like the following, but it doesn't provide the
same safety as the seemingly atomic require function does.

clojure.core=> (ns main)
nil
main=> (def message "hola")
#'main/message

-Travis

On Sep 4, 4:22 am, Krukow <karl.kru...@gmail.com> wrote:

ronen

unread,
Sep 5, 2009, 11:18:32 AM9/5/09
to Clojure
Not Clojure specific, the Spring framework has "refreshable beans"
support which enables partial code swap on production systems (http://
tiny.cc/3zctU), its much more limited than Erlang but still might
proove to be useful.

ronen

unread,
Sep 5, 2009, 11:18:43 AM9/5/09
to Clojure
Not Clojure specific, the Spring framework has "refreshable beans"
support which enables partial code swap on production systems (http://
tiny.cc/3zctU), its much more limited than Erlang but still might
proove to be useful.

On Sep 4, 9:30 pm, tmountain <tinymount...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages