20 users and "website under heavy load (queue full)" with 8GB 2 core EC2 ?

152 views
Skip to first unread message

nagaedadm

unread,
Oct 25, 2021, 8:05:53 AM10/25/21
to Canvas LMS Users
Hi, 

I got a server overload and ultimately 502 gateway error from my AWS ALB. 
The machine is specc'd with 8GB RAM and 2 cores running the LMS and RCE on it. Delayed jobs is running seperately.

The alarming thing is only 18-20 users were using it simultaneously.
running htop - I could see the two cores hitting 100%

Does anyone have a rough guide to what server sizing we are looking at to host 300-500 students in a school?

We are setting up an auto-scaling group to handle peak load but we didn't have any idea about estimating loads - until now.

Also, has anyone managed to successfully run the RCE on a seperate instance? I kept getting UnAuthorized although hitting '/rce' would return 200 with the message "Hello, from RCE Service"

I suspect the RCE (process handled by PM2) and Canvas (handled by Passenger) are to blame. maybe I need to run RCE with passenger and ditch PM2. 

Nico López

unread,
Oct 26, 2021, 9:01:51 AM10/26/21
to Canvas LMS Users
Hi, in our experience a Canvas server should have at least 4 vCPUS and 8GB RAM. We have a couple of them hosted in Hetzner with that configuration including Canvas, delayed jobs and RCE (handled by PM2). In some cases with hundred of concurrent users we had to move to 8vCPUS.

Graham Ballantyne

unread,
Oct 26, 2021, 12:55:40 PM10/26/21
to 'Zachary Rollyson' via Canvas LMS Users
Canvas scales pretty well horizontally. At SFU we have 20 Canvas application servers, each 4 vCPUs and 16GB of RAM. Delayed jobs are handled by two similarly-configured servers dedicated to jobs and some other background tasks; these don't serve web traffic at all. This term we have over 28k unique student enrollments. Our setup can usually handle our peak bursts of up to ~1500 simultaneous active users (as measured by Google Analytics realtime stats), however we do still get Passenger queueing and the "site under heavy load" error periodically, usually either at the start of our in-person class blocks, and/or when large classes (hundreds of students) have a scheduled quiz. At some point, adding more app servers or increasing existing server resources won't buy you much more improvement; the database will become your choke point. I think that's what we're starting to run into now, even with pgbouncer in place (which if you're not running, you should be, unless you're doing some magic AWS managed database stuff, i.e. not running it yourself on a VM).

We run RCE on two separate VMs (in a load balancer pool) using Passenger. I can't see why you'd be having an issue with PM2, though.

– 
Graham Ballantyne 
Senior Systems Engineer —  IT Services 
Simon Fraser University


--

---
You received this message because you are subscribed to the Google Groups "Canvas LMS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canvas-lms-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canvas-lms-users/cca9e02e-11b6-4054-9348-7dae2f00334an%40googlegroups.com.

nagaedadm

unread,
Oct 31, 2021, 5:36:40 AM10/31/21
to Canvas LMS Users
Thanks for your replies Nico and Graham. Sorry i took long to respond. 

Firstly, we have deployed all of Canvas on AWS - and use AWS-managed services where possible. I have not had success with Keyspaces (Cassandra) and therefore run it off an EC2 server. For small loads that should do but I am keen to move it to Keyspaces on our next iteration.
DB bottlenecks should be opened by unlocking a bigger AWS DB instance.

I do note that Aurora Postgresql was not a suitable flavour and I had no issues launching Canvas on RDS Postgresql.

It certainly looks like we will be running a large bank of servers for our users. We may be heading to 1500 users and possibly 300 at once initially.
Your baseline figures at SFU (Graham) are very good to give us an idea of "users per server".

Is there any setting to change in Passenger to improve performance? thanks
Reply all
Reply to author
Forward
0 new messages