4gb of ram no longer seems to be enough.

233 views
Skip to first unread message

Neptronix

unread,
May 10, 2017, 6:03:00 AM5/10/17
to Canvas LMS Users
Hi all. Per the Canvas LMS minimum requirements, i've stuck with 4gb of memory on a ubuntu 14.04 since 2013 when our school started using Canvas. Haven't had a problem with memory filling up until now. Always followed the production install to the letter and don't mess around with the config files at all.

We installed a version from January 2017 and now i am seeing errors in the main terminal about ruby processes being sacrificed by the OS due to memory restraints. Eventually, enough ruby processes get killed and then the whole system freezes until it is rebooted. There also appears to be a very slow memory leak.

I added a 1gb swap file to the server, but still see the occasional process get killed about once a week. So i have to go and reboot the server once a month now.
We run the postgreSQL service on the main box just like before. We have 24 active students and 20 courses. 

Here is what HTOP shows me when i sort by memory:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
15205 canvasuse  20   0  807M  444M  4868 S  0.0 11.3  0:02.43 Passenger RubyApp: /var/canvas
15200 canvasuse  20   0  807M  444M  4868 S  0.0 11.3  0:02.48 Passenger RubyApp: /var/canvas
15203 canvasuse  20   0  807M  444M  4868 S  0.0 11.3  0:00.00 Passenger RubyApp: /var/canvas
15206 canvasuse  20   0  807M  444M  4868 S  0.0 11.3  0:00.00 Passenger RubyApp: /var/canvas
 1502 canvasuse  20   0  775M  281M  2168 S  0.0  7.1 12:39.25 delayed:wait:1~canvas_queue:0:max
 1506 canvasuse  20   0  775M  281M  2168 S  0.0  7.1  0:00.00 delayed:wait:1~canvas_queue:0:max
 1488 canvasuse  20   0  708M  277M  1964 S  0.3  7.0 12:52.50 delayed:wait:1~canvas_queue:0:max
 1491 canvasuse  20   0  708M  277M  1964 S  0.0  7.0  0:00.00 delayed:wait:1~canvas_queue:0:max
 1492 canvasuse  20   0  785M  273M  2168 S  0.0  6.9 12:32.11 delayed:wait:1~canvas_queue:0:max
 1494 canvasuse  20   0  785M  273M  2168 S  0.0  6.9  0:00.00 delayed:wait:1~canvas_queue:0:max
 1498 canvasuse  20   0  767M  266M  2160 S  0.0  6.8 12:51.98 delayed:wait:1~canvas_queue:0:max
 1500 canvasuse  20   0  767M  266M  2160 S  0.0  6.8  0:00.00 delayed:wait:1~canvas_queue:0:max
 1483 canvasuse  20   0  697M  262M  1240 S  0.0  6.6 11:12.19 delayed:wait:1~canvas_queue:0:10
 1487 canvasuse  20   0  697M  262M  1240 S  0.0  6.6  0:00.00 delayed:wait:1~canvas_queue:0:10
 1485 canvasuse  20   0  692M  260M  1276 S  0.3  6.6 11:04.01 delayed:wait:1~canvas_queue:0:10
 1490 canvasuse  20   0  692M  260M  1276 S  0.0  6.6  0:00.00 delayed:wait:1~canvas_queue:0:10
 1182 redis      20   0  312M  141M   788 S  0.0  3.6 16:43.25 /usr/bin/redis-server 127.0.0.1:6379
 1184 redis      20   0  312M  141M   788 S  0.0  3.6  0:00.00 /usr/bin/redis-server 127.0.0.1:6379
 1185 redis      20   0  312M  141M   788 S  0.0  3.6  0:00.00 /usr/bin/redis-server 127.0.0.1:6379
 1482 canvasuse  20   0  740M 33476   552 S  0.0  0.8  0:06.29 delayed_jobs_pool
 1167 canvasuse  20   0  740M 33476   552 S  0.0  0.8  0:18.51 delayed_jobs_pool
 1463 canvasuse  20   0  740M 33476   552 S  0.0  0.8  0:00.00 delayed_jobs_pool
23299 canvasuse  20   0  740M 33476   552 S  0.0  0.8  0:00.00 delayed_jobs_pool
 1496 postgres   20   0  253M 22388 14508 S  0.0  0.6  3:52.65 postgres: canvas canvas_production 127.0.0.1(41430) idle
 1501 postgres   20   0  253M 22200 14484 S  0.0  0.5  3:52.09 postgres: canvas canvas_production 127.0.0.1(41433) idle
 1510 postgres   20   0  253M 21888 14312 S  0.3  0.5  3:51.12 postgres: canvas canvas_production 127.0.0.1(41438) idle
 1508 postgres   20   0  251M 18780 13800 S  0.0  0.5  3:51.30 postgres: canvas canvas_production 127.0.0.1(41436) idle
 1497 postgres   20   0  250M 15024  9532 S  0.0  0.4  3:32.75 postgres: canvas canvas_production 127.0.0.1(41431) idle
15412 postgres   20   0  245M 14892 10376 S  0.0  0.4  0:00.11 postgres: canvas canvas_production 127.0.0.1(47504) idle
 1495 postgres   20   0  250M 14880  9612 S  0.0  0.4  3:33.21 postgres: canvas canvas_production 127.0.0.1(41428) idle
11303 www-data   20   0  268M 12292  7888 S  0.0  0.3  0:00.05 /usr/sbin/apache2 -k start
16155 www-data   20   0  268M 11896  7796 S  0.0  0.3  0:00.34 /usr/sbin/apache2 -k start
11302 www-data   20   0  268M 11896  7596 S  0.0  0.3  0:00.05 /usr/sbin/apache2 -k start
 1101 postgres   20   0  240M 11856 11284 S  0.0  0.3  0:11.63 postgres: checkpointer process
F1Help  F2Setup F3SearchF4FilterF5Tree  F6SortByF7Nice -F8Nice +F9Kill  F10Quit

The canvas delayed jobs queue appears to be a big hog, as well as passenger. 

Here is my delayed_jobs.yml:

production:
  workers:
  - queue: canvas_queue
    workers: 2
    max_priority: 10
  - queue: canvas_queue
    workers: 4
  # if set, workers will process this many jobs and then die, causing the pool
  # to spawn another worker. this can help return memory to the OS.
  # worker_max_job_count: 20
  #
  # if set, workers will die and re-spawn of they exceed this memory usage
  # threshold. they will only die between jobs, not during a job.
  # worker_max_memory_usage: 1073741824
  #
  # disable periodic jobs auditor -- this isn't normally necessary
  # disable_periodic_jobs: true

default:
  workers:
  - queue: canvas_queue

This is weird because i have twice the delayed_jobs processes active in memory than is specified in the configuration file.

That being said, what is the best way to tune the memory usage down? I don't want to have to go to an 8gb instance on amazon and pay that much more money to host 20 users if i can avoid it. I dont't know what any negative consequences will be for changing things in delayed_jobs.yml either.

Any help is appreciated.

Keyla Centeno Diaz

unread,
May 10, 2017, 5:12:31 PM5/10/17
to Canvas LMS Users
I have the exact same problem with our AWS Canvas setup. We have 35 active students and one course running and my memory is being consumed like crazy. Any input on this would be appreciated.

Neptronix

unread,
May 11, 2017, 1:57:47 AM5/11/17
to Canvas LMS Users
Does your server also periodically hang up? 

If i go into the EC2 manager in amazon, click the server name under instances, and go to actions > instance settings >  get instance screenshot,
Over time i will see messages from the linux kernel about ruby processes periodically being 'sacrificed' due to high memory usage. When our server would become completely unresponsive, i would look at this screenshot and the root terminal would be FULL of these errors.

In my estimation, after about 10-15 instances of ruby processes being killed, the server will hang up and require a hard reboot because even SSH access would stop working, which is very bad because it could lead to corruption of the disk's data.

We added a 1gb swap file to our server, and our rate of accumulating 'out of memory' errors was cut to about 20% of the frequency. This means that i can go and pre-emptively reboot the server maybe every 2 months, But this is still not really acceptable.

Memory usage on our server usually hangs out within 2.5gb when freshly rebooted to about 3.7gb of 4gb used after a couple students or teachers have used it for the day. But in general, there seems to be a very slow memory leak going on, plus what i can only assume are memory spikes.

Since i'm not aware of any version numbers.. how recent is your install?

Neptronix

unread,
May 11, 2017, 5:46:46 AM5/11/17
to Canvas LMS Users
Okay, an update.. i tried tuning a few things

/var/canvas/config/delayed_jobs.yml:

production:
  workers:
  - queue: canvas_queue
    workers: 2
    max_priority: 10
  - queue: canvas_queue
    workers: 3
  # if set, workers will process this many jobs and then die, causing the pool
  # to spawn another worker. this can help return memory to the OS.
  worker_max_job_count: 15
  #
  # if set, workers will die and re-spawn of they exceed this memory usage
  # threshold. they will only die between jobs, not during a job.
  worker_max_memory_usage: 1073741824
  #
  # disable periodic jobs auditor -- this isn't normally necessary
  # disable_periodic_jobs: true

default:
  workers:
  - queue: canvas_queue

As you can see, i cut the default workers down from 4 to 3. This translates to 6 delayed jobs without the priority 10 processes being shown in TOP instead of 8. So, whatever settings you are getting there are effectively double of what you request, as i thought.

worker_max_job_count was uncommented and changed from 20 to 15.
worker_max_memory_usage was uncommented. The default setting is 1gb 

I do not like modifying settings like these without knowing the potential downsides, but i believe these two options are designed to stop a memory leak, and perhaps the canvas developers put them in to prevent such a condition.

Since passenger seems to be the biggest memory hog of them all, i tried lowering it's maximum pool size to 5 in /etc/apache2/sites-enabled/canvas.conf, as well as /etc/apache/mods-enabled/passenger.conf. After a complete reboot, this did not effect the maximum amount of passenger processes that can run. I have seen up to 10 instances of the memory devouring 'passenger rubyapp' process running at a time.

Anyway. i'm waiting on results for these modifications. Maybe we have a fix, maybe we don't!

Keyla Centeno Diaz

unread,
May 11, 2017, 8:16:32 PM5/11/17
to Canvas LMS Users
I've recently done the same thing as you did. One thing I noticed is that I didn't have swap turned on so every time my RAM took a dip, I got those error pages. I have 40 active users so we'll see how this turns out. Just out of curiosity what is the average of free ram you have when idle? I have about 700m but background job processes makes it dive a lot. I honestly think Canvas is not suited to run at 4G, especially if you everything hosted in one instance.

Nick Kokkos

unread,
May 12, 2017, 3:14:28 AM5/12/17
to Canvas LMS Users
One thing I know for sure is that delayed_jobs uses a lot of memory in increasing order if you run in developing mode.  But you run in production, is this correct?

See this: https://github.com/collectiveidea/delayed_job/issues/823

But Instructure runs its own fork of delayed_jobs, so I can not say anything more


Neptronix

unread,
May 13, 2017, 10:37:53 AM5/13/17
to Canvas LMS Users
Yeah, I run production mode.

Keyla Centeno Diaz

unread,
May 13, 2017, 11:32:46 PM5/13/17
to Canvas LMS Users
How does your vmstat look like? Once mine swaps a little bit it continues to swap in 2kb/s and swap out 1kb/s even if there is plenty of ram free. vm.swappiness is set to 10 so that is not the issue here.

Neptronix

unread,
May 25, 2017, 2:52:09 PM5/25/17
to Canvas LMS Users
Same settings as me. Currently using 25% of a 1gb swap, but it does vary.
Reply all
Reply to author
Forward
0 new messages