Everybody updates their code. Some more than others, and many during business hours. This leads to the inevitable conundrum of needing to reinit your ColdBox application to pick up said changes. If ColdBox is reinitted while a server is under load, some users may get errors, so it's a very common to get questions about how to handle that.
Reinitting the framework while users are on a server is kind of like replacing the wings on an airplane while it's in the air. It's an inherently destructive operation. First, post shutdown hooks are called that shut down aspects like Wirebox, LogBox, and CacheBox. Then the entire framework is destroyed and the application scope wiped clean so no bits are left behind and it can all be garbage collected.
Next, a fresh copy of the framework is built using the new settings, handlers, and services and put in place so it can start processing requests. That might only take a few seconds (or perhaps it takes a lot of seconds if you are loading and caching a ton of data). Either way, there's two types of requests that pose problems during that process:
Sometimes, these requests are able to process through just fine if they miss the narrow window where the framework is coming back up. Other times they error out and that might not be ideal for your users. There are 3 basic ways to approach reinits under load and they are as follows:
These each have their pros and cons which we'll discuss below.
So, before I go any farther, I have to point out that this entire topic is really superseded by the following advice:
Don't reinit a server while it's under load. Ever.
The "correct" process is to reload code like so:
This is made incredibly easy with technology like Docker Swarm. In fact, every step above can be completely automated and happen in a number of seconds without any manual changes needed to a load balancer appliance. We are getting in Docker Swarm right now and it is solving so many of these deployment concerns for us, I wish it existed years ago. However, not all of you have the luxury of controlling the entire environment that you're deploying to, or even choosing your technology stack. Many of you only really have control of the code, so that's why I'm still writing this guide :)
To facilitate this post, I have created a sample repo on GitHub.
https://github.com/bdw429s/coldbox-reinit-examples/
You can clone the repo, run box install and box start to fire up the examples, but I provided the repo mostly so you'd have the code for a references. There is a branch in the repo for each example. Here's what I have set up:
Let's take a look at what ColdBox does out of the box when a framework reinit happens. This is represented on the master branch of the repo.
Let's take a look at how the server behaves when a reinit comes in under load. Click the thumbnails to see the full image.
On this topic, there's also a nice module for ColdBox called "autoreload' that will fire a reinit for you when a special file has the timestamp updated. This is great for clustered deploys where you can't feasibly reinit all the servers manually. The module can be found here and I added an extra branch to the repo called "auto-deploy" that shows this in action along with a modified JMeter script that calls an action that just manually touches the timestamp on the deploy tag.
Switch the Git repo to the fail-fast branch and you'll see an example of how to perform a fail-fast maneuver in which users hitting your site will get an immediate "under maintenance" message.
Let's take a look at how the server behaves when a reinit comes in under load with fail fast set up. Click the thumbnails to see the full image.
Switch the Git repo to the Forced-thread-safety branch to see what it takes to make your reinits never step on the toes of any other request without punting with an error message.
Let's take a look at how the server behaves when a reinit comes in under load with thread safe set up. Click the thumbnails to see the full image.
So, you aren't limited to these exact options; you can mix and match some techniques. For instance, the issues of the few errors in the fail safe method coming from requests that were already underway when the reinit started can be mitigated by combining fail safe with thread safe. Basically you would gain the pro of having no errors but you would gain the con of the reinit not being able to start until all requests were done which might take some time.
As you've probably guessed, the "correct" solution for you really depends on which of the pros you require and which of the cons you are willing to live with. Hopefully this post helps you understand the difference between the approaches. The code samples should give you an idea of how to implement them in your apps. Thanks for flying the friendly Internet with Coldbox and if you're future development plans bring you to MVC, we hope to see you again.