With 150 or so active users (students, staff and faculty) during a recent week-long intensive, we started the week with:
* Postgres canvas_production database, Cassandra analytics database, and redis on one server (2 cores, 4 GB RAM)
* delayed jobs on one server (1 core, 2 GB RAM)
* 2 app servers (2 cores, 4 GB RAM each) behind a load balancer that also terminated SSL
* using S3 for storage
After a day we really needed a 3rd app server during peak and we spun up one more application server and saw response times improve. We offloaded outgoing mail to Mailgun to ensure reliable delivery (I think we ended the week with about 10,000 emails sent from Canvas).
I can’t remember, we might have bumped the delayed jobs server up to 2 cores/4 GB for the intensive and then dropped it back down after (most of the year is asynchronous distance work for our students). At the smaller size, it lives at 98-100% RAM usage and CPU usage is quite high percentage-wise but it gets the job done.
It’s possible a single 6- or 8-core app server with 12-16 GB of RAM would have worked just as well (or for all I know, better), but this is what we did. We originally had everything on a single server when we were piloting 3 courses. The single biggest performance boost was offloading delayed jobs to a standalone server. The thing is those jobs will use a lot of CPU and RAM, but not consistently, so you can think you’re doing ok and then the jobs kick in and things can grind to a halt on the web app side even if only a single user is accessing that interface. At least that was our experience.