Hi,
Sorry - bit of a long message below, but this needs a bit of explaining. Hopefully someone's bored on a Saturday and doesn't mind taking a look (Kevin?)
I'm running Rubber 3.2.2 and am in the process of upgrading our application from Rails 4.2 to Rails 5. As a result of this, I need to bump our Ruby version from 2.1.2 to Ruby 2.3.1.
I was hoping to validate my thoughts with someone familiar with Rubber since I'm a bit of an amateur with it. This is the first time I've tried something like this.
Here's our current infrastructure setup:
- PostgreSQL on RDS (not managed via Rubber)
- 10 EC2 instances, one of which has the db:primary role. All instances also have the web and app roles
- 2 EC2 instances both of which have the sidekiq role
- 1 EC2 instance running Redis (not managed via Rubber)
- The web instances are behind several ELBs
So far, I have done the following:
In rubber-ruby.yml I:
- updated the ruby_build_version to: 20160602
- updated the ruby_version to: 2.3.1
- Updated the Rails app appropriately
My goal is to get everything upgraded with as little downtime as possible, since we have a fairly busy application.
I see 2 possible approaches to doing the server upgrades:
The first:
- Merge my Rails 5 branch into master
- Go into offline mode
- cap rubber:sidekiq:quiet on the sidekiq workers so they stop accepting new jobs
- cap rubber:bootstrap across all instances to get the Ruby upgrade
- cap deploy across all instances to deploy the new code
- cap rubber:sidekiq:restart on the sidekiq workers
- Go back into online mode
- Hopefully everything works
The second:
- Do not merge to master yet. Switch to my Rails 5 branch
- cap rubber:create new instances, one for each of the original instances.
- cap rubber:bootstrap all new instances. Then I should have 12 EC2 instances that are now running Ruby 2.3.1.
- Merge my Rails 5 branch into master
- Switch to master
- cap deploy all new instances.
- Do not add the new instances to the ELBs yet.
- Test new instances
- Put app in offline mode
- Remove old instances from ELBs
- Add new instances to ELBs
- Put app back in online mode
- Hopefully everything works
- Decommission old instances
The problem that I see with the first approach is that it's a bit of a black box. I'm concerned about potential failures and downtime due to unforeseen issues.
The second approach seems logical, but also potentially problematic for the following reasons:
- The minute I cap deploy the sidekiq workers, they will begin processing jobs. I don't really want this. I guess I could quickly quiet them if I have to, but maybe there's a better approach?
- I tested this approach with a single new instance on the web side of things, and it seemed to work, except - for some reason I've been unable to understand, it also installed and started PostgreSQL, which is not what I want.
If anyone on the list has an opinion or advice on how best to go about this, I'd really appreciate hearing it!
Cheers,
Matt