4gb of ram no longer seems to be enough.

Neptronix

unread,

May 10, 2017, 6:03:00 AM5/10/17

to Canvas LMS Users

Hi all. Per the Canvas LMS minimum requirements, i've stuck with 4gb of memory on a ubuntu 14.04 since 2013 when our school started using Canvas. Haven't had a problem with memory filling up until now. Always followed the production install to the letter and don't mess around with the config files at all.

We installed a version from January 2017 and now i am seeing errors in the main terminal about ruby processes being sacrificed by the OS due to memory restraints. Eventually, enough ruby processes get killed and then the whole system freezes until it is rebooted. There also appears to be a very slow memory leak.

I added a 1gb swap file to the server, but still see the occasional process get killed about once a week. So i have to go and reboot the server once a month now.

We run the postgreSQL service on the main box just like before. We have 24 active students and 20 courses.

Here is what HTOP shows me when i sort by memory:

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command

15205 canvasuse 20 0 807M 444M 4868 S 0.0 11.3 0:02.43 Passenger RubyApp: /var/canvas

15200 canvasuse 20 0 807M 444M 4868 S 0.0 11.3 0:02.48 Passenger RubyApp: /var/canvas

15203 canvasuse 20 0 807M 444M 4868 S 0.0 11.3 0:00.00 Passenger RubyApp: /var/canvas

15206 canvasuse 20 0 807M 444M 4868 S 0.0 11.3 0:00.00 Passenger RubyApp: /var/canvas

1502 canvasuse 20 0 775M 281M 2168 S 0.0 7.1 12:39.25 delayed:wait:1~canvas_queue:0:max

1506 canvasuse 20 0 775M 281M 2168 S 0.0 7.1 0:00.00 delayed:wait:1~canvas_queue:0:max

1488 canvasuse 20 0 708M 277M 1964 S 0.3 7.0 12:52.50 delayed:wait:1~canvas_queue:0:max

1491 canvasuse 20 0 708M 277M 1964 S 0.0 7.0 0:00.00 delayed:wait:1~canvas_queue:0:max

1492 canvasuse 20 0 785M 273M 2168 S 0.0 6.9 12:32.11 delayed:wait:1~canvas_queue:0:max

1494 canvasuse 20 0 785M 273M 2168 S 0.0 6.9 0:00.00 delayed:wait:1~canvas_queue:0:max

1498 canvasuse 20 0 767M 266M 2160 S 0.0 6.8 12:51.98 delayed:wait:1~canvas_queue:0:max

1500 canvasuse 20 0 767M 266M 2160 S 0.0 6.8 0:00.00 delayed:wait:1~canvas_queue:0:max

1483 canvasuse 20 0 697M 262M 1240 S 0.0 6.6 11:12.19 delayed:wait:1~canvas_queue:0:10

1487 canvasuse 20 0 697M 262M 1240 S 0.0 6.6 0:00.00 delayed:wait:1~canvas_queue:0:10

1485 canvasuse 20 0 692M 260M 1276 S 0.3 6.6 11:04.01 delayed:wait:1~canvas_queue:0:10

1490 canvasuse 20 0 692M 260M 1276 S 0.0 6.6 0:00.00 delayed:wait:1~canvas_queue:0:10

1182 redis 20 0 312M 141M 788 S 0.0 3.6 16:43.25 /usr/bin/redis-server 127.0.0.1:6379

1184 redis 20 0 312M 141M 788 S 0.0 3.6 0:00.00 /usr/bin/redis-server 127.0.0.1:6379

1185 redis 20 0 312M 141M 788 S 0.0 3.6 0:00.00 /usr/bin/redis-server 127.0.0.1:6379

1482 canvasuse 20 0 740M 33476 552 S 0.0 0.8 0:06.29 delayed_jobs_pool

1167 canvasuse 20 0 740M 33476 552 S 0.0 0.8 0:18.51 delayed_jobs_pool

1463 canvasuse 20 0 740M 33476 552 S 0.0 0.8 0:00.00 delayed_jobs_pool

23299 canvasuse 20 0 740M 33476 552 S 0.0 0.8 0:00.00 delayed_jobs_pool

1496 postgres 20 0 253M 22388 14508 S 0.0 0.6 3:52.65 postgres: canvas canvas_production 127.0.0.1(41430) idle

1501 postgres 20 0 253M 22200 14484 S 0.0 0.5 3:52.09 postgres: canvas canvas_production 127.0.0.1(41433) idle

1510 postgres 20 0 253M 21888 14312 S 0.3 0.5 3:51.12 postgres: canvas canvas_production 127.0.0.1(41438) idle

1508 postgres 20 0 251M 18780 13800 S 0.0 0.5 3:51.30 postgres: canvas canvas_production 127.0.0.1(41436) idle

1497 postgres 20 0 250M 15024 9532 S 0.0 0.4 3:32.75 postgres: canvas canvas_production 127.0.0.1(41431) idle

15412 postgres 20 0 245M 14892 10376 S 0.0 0.4 0:00.11 postgres: canvas canvas_production 127.0.0.1(47504) idle

1495 postgres 20 0 250M 14880 9612 S 0.0 0.4 3:33.21 postgres: canvas canvas_production 127.0.0.1(41428) idle

11303 www-data 20 0 268M 12292 7888 S 0.0 0.3 0:00.05 /usr/sbin/apache2 -k start

16155 www-data 20 0 268M 11896 7796 S 0.0 0.3 0:00.34 /usr/sbin/apache2 -k start

11302 www-data 20 0 268M 11896 7596 S 0.0 0.3 0:00.05 /usr/sbin/apache2 -k start

1101 postgres 20 0 240M 11856 11284 S 0.0 0.3 0:11.63 postgres: checkpointer process

F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit

The canvas delayed jobs queue appears to be a big hog, as well as passenger.

Here is my delayed_jobs.yml:

production:

workers:

- queue: canvas_queue

workers: 2

max_priority: 10

- queue: canvas_queue

workers: 4

# if set, workers will process this many jobs and then die, causing the pool

# to spawn another worker. this can help return memory to the OS.

# worker_max_job_count: 20

#

# if set, workers will die and re-spawn of they exceed this memory usage

# threshold. they will only die between jobs, not during a job.

# worker_max_memory_usage: 1073741824

#

# disable periodic jobs auditor -- this isn't normally necessary

# disable_periodic_jobs: true

default:

workers:

- queue: canvas_queue

This is weird because i have twice the delayed_jobs processes active in memory than is specified in the configuration file.

That being said, what is the best way to tune the memory usage down? I don't want to have to go to an 8gb instance on amazon and pay that much more money to host 20 users if i can avoid it. I dont't know what any negative consequences will be for changing things in delayed_jobs.yml either.

Any help is appreciated.

Keyla Centeno Diaz

unread,

May 10, 2017, 5:12:31 PM5/10/17

to Canvas LMS Users

I have the exact same problem with our AWS Canvas setup. We have 35 active students and one course running and my memory is being consumed like crazy. Any input on this would be appreciated.

Neptronix

unread,

May 11, 2017, 1:57:47 AM5/11/17

to Canvas LMS Users

Does your server also periodically hang up?

If i go into the EC2 manager in amazon, click the server name under instances, and go to actions > instance settings > get instance screenshot,

Over time i will see messages from the linux kernel about ruby processes periodically being 'sacrificed' due to high memory usage. When our server would become completely unresponsive, i would look at this screenshot and the root terminal would be FULL of these errors.

In my estimation, after about 10-15 instances of ruby processes being killed, the server will hang up and require a hard reboot because even SSH access would stop working, which is very bad because it could lead to corruption of the disk's data.

We added a 1gb swap file to our server, and our rate of accumulating 'out of memory' errors was cut to about 20% of the frequency. This means that i can go and pre-emptively reboot the server maybe every 2 months, But this is still not really acceptable.

Memory usage on our server usually hangs out within 2.5gb when freshly rebooted to about 3.7gb of 4gb used after a couple students or teachers have used it for the day. But in general, there seems to be a very slow memory leak going on, plus what i can only assume are memory spikes.

Since i'm not aware of any version numbers.. how recent is your install?

Neptronix

unread,

May 11, 2017, 5:46:46 AM5/11/17

to Canvas LMS Users

Okay, an update.. i tried tuning a few things

/var/canvas/config/delayed_jobs.yml:

production:

workers:

- queue: canvas_queue

workers: 2

max_priority: 10

- queue: canvas_queue

workers: 3

# if set, workers will process this many jobs and then die, causing the pool

# to spawn another worker. this can help return memory to the OS.

worker_max_job_count: 15

#

# if set, workers will die and re-spawn of they exceed this memory usage

# threshold. they will only die between jobs, not during a job.

worker_max_memory_usage: 1073741824

#

# disable periodic jobs auditor -- this isn't normally necessary

# disable_periodic_jobs: true

default:

workers:

- queue: canvas_queue

As you can see, i cut the default workers down from 4 to 3. This translates to 6 delayed jobs without the priority 10 processes being shown in TOP instead of 8. So, whatever settings you are getting there are effectively double of what you request, as i thought.

worker_max_job_count was uncommented and changed from 20 to 15.

worker_max_memory_usage was uncommented. The default setting is 1gb

I do not like modifying settings like these without knowing the potential downsides, but i believe these two options are designed to stop a memory leak, and perhaps the canvas developers put them in to prevent such a condition.

Since passenger seems to be the biggest memory hog of them all, i tried lowering it's maximum pool size to 5 in /etc/apache2/sites-enabled/canvas.conf, as well as /etc/apache/mods-enabled/passenger.conf. After a complete reboot, this did not effect the maximum amount of passenger processes that can run. I have seen up to 10 instances of the memory devouring 'passenger rubyapp' process running at a time.

Anyway. i'm waiting on results for these modifications. Maybe we have a fix, maybe we don't!

Keyla Centeno Diaz

unread,

May 11, 2017, 8:16:32 PM5/11/17

to Canvas LMS Users

I've recently done the same thing as you did. One thing I noticed is that I didn't have swap turned on so every time my RAM took a dip, I got those error pages. I have 40 active users so we'll see how this turns out. Just out of curiosity what is the average of free ram you have when idle? I have about 700m but background job processes makes it dive a lot. I honestly think Canvas is not suited to run at 4G, especially if you everything hosted in one instance.

Nick Kokkos

unread,

May 12, 2017, 3:14:28 AM5/12/17

to Canvas LMS Users

One thing I know for sure is that delayed_jobs uses a lot of memory in increasing order if you run in developing mode. But you run in production, is this correct?

See this: https://github.com/collectiveidea/delayed_job/issues/823

But Instructure runs its own fork of delayed_jobs, so I can not say anything more

Neptronix

unread,

May 13, 2017, 10:37:53 AM5/13/17

to Canvas LMS Users

Yeah, I run production mode.

Keyla Centeno Diaz

unread,

May 13, 2017, 11:32:46 PM5/13/17

to Canvas LMS Users

How does your vmstat look like? Once mine swaps a little bit it continues to swap in 2kb/s and swap out 1kb/s even if there is plenty of ram free. vm.swappiness is set to 10 so that is not the issue here.

Neptronix

unread,

May 25, 2017, 2:52:09 PM5/25/17

to Canvas LMS Users

Same settings as me. Currently using 25% of a 1gb swap, but it does vary.

Reply all

Reply to author

Forward