This outlines our upgrade process:
https://gist.github.com/grahamb/98e72098d5a96915421b#package-upgrade-process. It's several years old and some of the mechanisms
have changed (e.g. we no longer use Capistrano), but the basics are the same. In general, we:
- Build a Canvas release on a separate build box (check out repos, bundle & npm install, compile assets) and generate an artifact (a tarball of that release). This ensures we're deploying the same built Canvas to all our environments.
- Deploy that artifact through our various test/stage/production environments (previously via Capistrano, now with Ansible).
- Code is pushed to servers (we have 20 app servers, two jobs servers) into a release directory (e.g. /var/rails/canvas/releases/20220217181417-sfu-release-2022-02-17-5 for our current release)
- We use a shared NFS mount for the Canvas files storage and symlink it into the release directory
- We rebuild brand assets on one of the jobs servers, copy those to another shared NFS mount, and symlink into public/dist/brandable_css
- Database migrations are run on one of our management (jobs) servers
- We have a "current" symlink (/var/rails/canvas/current) that we use as the Apache DocumentRoot (/var/rails/canvas/current/public to be precise). Passenger picks that up as the app root. As part of the deploy, we re-point that symlink at the
new release.
- We have template config files for each environment and copy them into the config directory.
The Ansible playbook we use to deploy is here; tasks are executed in sequence.
We keep five releases on disk (current and four previous). I don't think we've ever rolled back to one in nearly 10 years. For logs, we have Rails logging to syslog. Syslog on each individual server is forwarded to our central logging service,
no logs on local disk on those machines (technically they're in /var/log/messages I guess but that gets rotated by syslog). As well, the app servers forward the Canvas syslog facility to one of our Canvas management (jobs) servers which aggregates all of the
web and jobs logs into one log file on disk; it's the firehose of all Canvas app logs. We rotate that file daily, but it is also forwarded to our central logging service, where we've got at least a year, probably more, of archived logs if we ever need them.