Hi Folks,
I've brought this topic up a few times over the last few years. if you're self-hosting a Canvas install with even a small user base, you will have noticed the Canvas like to use a lot of storage. Having several TB of network-attached, backed-up storage is expensive. The bulk of this, in our case, is student file submissions. This may be partly because we're an art & design school, so we get a lot of hi-res images and videos being posted. Instructor course files are also a big chunk, as are "disposable" files: system-generated files that a user downloads (probably) once, then no longer needs: submission exports, ePubs, content migrations. These add up.
Our LMS Advisory Committee formulated a data retention policy to help curb the growth of our storage requirements. We decided to maintain one full academic year of student submission and four full academic years of course material. To enact this policy, we needed a way to actually delete those files.
Enter
FileZapper. A class that you can include in your Canvas instance to delete filers or replace them with a "placeholder" file that indicates the original file has been removed. I've taken some stabs at writing something like this in the past, but this is a much more thorough and well-considered approach. I've done testing in our staging environment and will be running in production next week. I'll be adding documentation and examples shortly. I thought I would share in case anyone else finds it useful or would like to contribute improvements.