How to successfully get canvas to not run out of memory on 4gb of memory.

Neptronix

unread,

May 25, 2017, 3:03:41 PM5/25/17

to Canvas LMS Users

I just wanted to give you guys an update on my previous thread.

Canvas was hanging due to out of memory errors with only 20 students active on a box with 4gb of ram and the database running on the same machine. I would see errors about linux sacrificing processes for memory gradually add up, and then the server would just require a hard reboot.

I added a 1gb swap file and that cut the time for the server to hang up to about half. It would only bail every 2 weeks instead of once a week.

The final straw was poking at Canvas' configuration files, and here is the post that lead to getting Canvas' memory management in line:

/var/canvas/config/delayed_jobs.yml:

production:

workers:

- queue: canvas_queue

workers: 2

max_priority: 10

- queue: canvas_queue

workers: 3

# if set, workers will process this many jobs and then die, causing the pool

# to spawn another worker. this can help return memory to the OS.

worker_max_job_count: 15

#

# if set, workers will die and re-spawn of they exceed this memory usage

# threshold. they will only die between jobs, not during a job.

worker_max_memory_usage: 1073741824

#

# disable periodic jobs auditor -- this isn't normally necessary

# disable_periodic_jobs: true

default:

workers:

- queue: canvas_queue

As you can see, i cut the default workers down from 4 to 3. This translates to 6 delayed jobs without the priority 10 processes being shown in TOP instead of 8. So, whatever settings you are getting there are effectively double of what you request, as i thought.

worker_max_job_count was uncommented and changed from 20 to 15.

worker_max_memory_usage was uncommented. The default setting is 1gb

I do not know which setting tipped the scales, but we have 18 days of uptime with not a single report of the linux kernel killing a process for memory. Huge improvement.

Swapfile usage is ~25% of 1gb, and we have 1.6gb of memory 'free'.

My best guess is that Canvas' delayed jobs system has a slow memory leak and the uncommented options were the fix.

Anyway, happy canvasing and i hope this helps you as much as it did us.

Neptronix

unread,

Jun 8, 2017, 1:39:38 PM6/8/17

to Canvas LMS Users

It has been a month by now, and these settings continue to work 100%. Linux kernel is not intervening by killing tasks anymore, and the system is happily servicing 30 students during the most busy time of the year.

I am a bit disappointed that i was the first person here to figure out how to get this newer version working without taking a dive. The stock config files need to be different.. and needing a 4gb amazon server to service 30 students is rather ridiculous. We have another similar system that can service >2000 students with 2gb of ram - and that includes php, apache, and mysql running on one server.

Ruby with tons of libraries used is a real hog!

Neptronix

unread,

Aug 13, 2017, 3:41:03 PM8/13/17

to Canvas LMS Users

Another update.. after a hundred students hammered our canvas server over the summer.
Canvas still has a memory leak or memory overusage on 4gb... despite my best tweaks.

I have to reboot the server every 3 months or it just stops responding after 10 linux kernel 'child sacrifices' of processes, as it calls them.
It is the redis-server that is getting killed now, while one instance of it manages to rack up a memory usage of 0.5gb.

This application can't properly run on a 4gb machine with a 1gb swap file.

I'm pretty dismayed that no developers have responded to any feedback or requests for help on this issue. They seem to have forgotten us open source users.. not just on this thread, but others.

vn...@yandex.com

unread,

Aug 13, 2017, 7:57:56 PM8/13/17

to Canvas LMS Users

It is summer time in North America, schools are not in session. I think it may simply be people are on holiday!

Graham Ballantyne

unread,

Aug 14, 2017, 1:43:38 PM8/14/17

to canvas-l...@googlegroups.com

Hi there,

I have to disagree with your assertion that "[t]hey seem to have forgotten us open source users". My university runs, as far as we know, the largest higher-ed open-source Canvas installation. We also have other dual-licensed (commercial & open-source) software in production here. Our current email solution is one such package, and in that case we use their commercial version. The support we've received from Instructure's engineering staff over the last five years has been a million times better than the commercial support we get from our email solution vendor (one of the reasons we're dumping them and going to Exchange). I've never had engineers from other projects contact me directly with heads-ups about incoming changes, or offer to jump on a google hangout to work through a weird, completely-our-problem issue.

No one at Instructure, or anyone at all for that matter, is under any obligation to provide any kind of support whatsoever. Heck, it's in the header of just about every single file in the Canvas code:

# Canvas is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
# A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
# details.

The engineers that do post here, and in IRC, do so on their own, because they truly do believe in the open-source aspect of Canvas. They do consider open-source users when designing new features – Quizzes.next, for example, which would have been much easier to build if they weren't going to open-source it.

One thing you need to consider is that your use case (using a single, resource-constrained server for your entire stack) is a much different use case that what Instructure, or many other sites, runs into. We're no different. As an example: Instructure's hosted Canvas uses S3 for file storage. Our site uses local file storage (because Reasons). We *constantly* run into weird little bugs with local storage; Instructure doesn't use it so it isn't as well-tested. At the end of the day they're a business and they've got to put their resources where it's going to make them money. The beauty of open-source is that since we've got a vested interest in making local storage better, we can fix it. When we do, we contribute it back so that others get the same benefit.

So, get involved! If you really think that there's a memory leak somewhere, profile it and track it down. Open an issue on the repo, or a PR with a fix.

Now, a couple of practical things for your situation:

* Redis *wants* to use all the memory you can throw at it, and Canvas is a very heavy Redis user. The various guides (1, 2) for running Redis in production have tweaks that can help with resource usage, but they may not be what you want to do on a shared box.

* Postgres's process model (one process per connection) can really eat into your resources; putting pgbouncer in front of it may help.

* Putting delayed jobs on their own box is a best practice for a number of reasons, prime being that long-running jobs won't take resources away from your web requests.

I do hope you can find a solution to your problem, and hopefully the list can help you out as well.

Cheers,

Graham.

--

---
You received this message because you are subscribed to the Google Groups "Canvas LMS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canvas-lms-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.