There are currently 2 memory compression methods for Linux kernel, zswap and zram (zcache was removed in like kernel 3.11). The former acts as a compressed swap cache for a currently existing swap space on the system and the latter acts as a compressed in-RAM swap device that can also be used as a RAMdisk (storing tmpfs). However, at least based on the principle of trying to reduce swapping pressure on systems that need it, there seems to be a day and night difference between the two technologies.
For context, I am on an old laptop from 2016 which has 6 GB of RAM 2133 mHz DDR4 and an old Toshiba 5400 RPM HDD. Now you would expect that this machine would be the worst candidate for trying to do memory overcommit and that it's pointless to try and do a memory-heavy workload, however I can't seem to explain the following behavior and I would like more clarification on how zswap works and some potential changes to be made to ammend the issue.
The following scenario: When I use ZRAM and I set the size of the block device to roughly 200-225% percent of my physical memory with LZO-RLE compression (I also use the le9 patchset on 5.4 LTS kernel), I can open roughly 70+ Chromium tabs, a few QEMU/KVM virtual machines (with shared memory ofc) with Minecraft, a few other Electron programs, PDF viewer with more than 4 PDFs open, and eventualy I have like 6-9 GB compressed down to roughly 2-3 GB (I could spend less time compressing/decompressing with LZ4 and more memory saving with ZSTD, but LZO-RLE remains a good balance between two), and memory pressure would not neccessarily be overwhelming (I can still interact with my system and use applications no problem, kswapd might be using higher CPU usage than normal but nothing too painful). The only problem is that if I invoke anything that causes high I/O to my HDD, such as installing a Steam game or writing stuff to a flash drive with dd, the system's interactivity goes from 100 to 1*10^-4 because of the dirty ratio being exhausted, leaving very little room for the disk cache neccessary to keep the system operating in a stable manner. So obviously while ZRAM does allow me to get away from memory overcommit and effectively double/triple my memory by allowing inactive/idle pages to get compressed while active memory remains in the RAM uncompressed, the whole disk I/O thing is still a problem if I do disk operations. Not to mention that because ZRAM only supports the zsmalloc allocator, (there's been work back in like 2019 to get it support to zpool API used by zswap, but nothing came from it), there's no LRU eviction support, so LRU inversion is something that is quite common.
Ok, what's the alternative then, I set up zswap with the parameters in grub and make a swapfile. Now this is something I quite don't understand. When I use atop, I can monitor how much memory the zpool containing compressed swap ache is using, but it doesn't seem like zswap actually solves the "swapping on a slow HDD is painful" problem for the following reason:
- Regardless of what `max_percent_pool` is set to, zswap always seem to move swap into the swapfile while at the same time compressing some of the swap data into the zpool, but if disk I/O will occur regardless, what's the point of zswap then? Why doesn't zswap first take in the needed pages to be swapped through frontswap, compress them until the zpool takes a specific amount of memory as defined by the previously mentioned parameter, and then start decompressing the pages from the zpool to the disk then? Isn't this what it's supposed to do? Or am I misinterpreting zswap's functionality?
This leads to the following behavior: first of all, I cannot replicate what I do with ZRAM on zswap, which is already a bit of a let down, the second thing is that let's say I have Discord open, and I increase vm.watermark_scale_factor to make kswapd more aggressive. I actively watch videos on Chromium and whatnot, and then when I reopen Discord, the system stalls/freezes for like 2-3 seconds before Discord opens up. This shouldn't happen as it does not happen with ZRAM, and the only explanation is that instead of reclaiming the previously evicted paged for Discord from the zpool cache, it does it directly from the swapfile which is a lot slower for obvious reasons. Why does this happen? Isn't zswap supposed to only do swapping to the swap device when necessary (the so called pool being exhausted).
What I've gotten out of this is that zswap is useless and ZRAM as swap is magnitudes superior in all ways, except for one annoying aspect (ZRAM received CONFIG_ZRAM_WRITEBACK support in kernel v 4.14, allowing it to write incompressible or idle data to a block device if needed, anothering contributing factor towards zswap's reduced relevancy). This could explain why it's used on most Android phones/Chromebooks, and why Fedora uses it by default as of v33.
EDIT: Ok, never mind what I just wrote, I think I found out why zswap wasn't working so well for me. It turns out that Linux's kernel LRU reclaimation is by default very expensive and doesn't have a good idea of what to evict, so I found two solutions, increase the size of the le9's clean kbytes for low, min, and anon_min, or use the new MG-LRU improvement by Google. zswap's been working fantastically since then. I highly recommend anyone suffering from the same gripes I had to do themselves a favour and use aggressive le9 settings/MG-LRU.
I found an alternative to zram you can use if you have a swap partition available. It has some advantages. I used to think zram and zswap were mainly for low ram computers but I recently ran into a situation where I needed 6 gigs of ram and 6 gigs of swap. This led me to rethink having zram and zswap enabled even when you have quite a bit of memory. You might find the links below helpful.
So for a 2 gig ram system if you set the zswap allocation of ram to 50% you essentially end up with a 3 gig ram system although the zswap portion will run at slower speeds due to the time requirements for compression and decompression.
Yes the setting of the zswap allocation to 100% is too good to be true. You have to remember that compressed ram is slower and it also results in increased cpu load since every compression/decompression operation is done by the cpu. I think zswap works best when you have a fast cpu and little ram since the trade off is going to be worthwhile since cpu performance loss is not critical in this case. It also would work well with a slow cpu that has many cores which is typical of servers and that seems to be what zswap was designed for in the first place.
I mean to say, it is probably not even about zswap is slower and consumes cpu. I suspect that, ram and zswap are different in probably the same way that ram and normal swap are different, and again it is not about speed. Specifically, in a system with 2 gig ram, and 20% of them (i.e. 0.4 gig) are used as zswap, the OS might consider there is only 1.6 gig ram available, and start swapping out pages when the memory consumption is approaching the 1.6 gig threshold. If I understand correctly, zswap is a cache to swap, thus somewhat transparent to OS. The OS does NOT see and operate as if it has 2.4 gig ram. So, setting the zswap allocation to 100% might end up badly (I may still try it though). All that being said, I found no proof to my hypothesis. The output of free still shows not 1.6 gig, not 2.4 gig, but still 2 gig ram in my computer.
zswap is a Linux kernel feature that provides a compressed write-back cache for swapped pages, as a form of virtual memory compression. Instead of moving memory pages to a swap device when they are to be swapped out, zswap performs their compression and then stores them into a memory pool dynamically allocated in the system RAM. Later writeback to the actual swap device is deferred or even completely avoided, resulting in a significantly reduced I/O for Linux systems that require swapping; the tradeoff is the need for additional CPU cycles to perform the compression.[1][2][3]
As a result of reduced I/O, zswap offers advantages to various devices that use flash-based storage, including embedded devices, netbooks and similar low-end hardware devices, as well as to other devices that use solid-state drives (SSDs) for storage. Flash memory has a limited lifespan due to its nature, so avoiding it to be used for providing swap space prevents it from wearing out quickly.[4]
zswap is integrated into the rest of Linux kernel's virtual memory subsystem using the API provided by frontswap, which is a mechanism of the Linux kernel that abstracts various types of storage that can be used as swap space.[5] As a result, zswap operates as a backend driver for frontswap by providing what is internally visible as a pseudo-RAM device. In other words, the frontswap API makes zswap capable of intercepting memory pages while they are being swapped out, and capable of intercepting page faults for the already swapped pages; the access to those two paths allows zswap to act as a compressed write-back cache for swapped pages.[1][6]
The maximum size of the memory pool used by zswap is configurable through the sysfs parameter max_pool_percent, which specifies the maximum percentage of total system RAM that can be occupied by the pool. The memory pool is not preallocated to its configured maximum size, and instead grows and shrinks as required. When the configured maximum pool size is reached as the result of performed swapping, or when growing the pool is impossible due to an out-of-memory condition, swapped pages are evicted from the memory pool to a swap device on the least recently used (LRU) basis. This approach makes zswap a true swap cache, as the oldest cached pages are evicted to a swap device once the cache is full, making room for newer swapped pages to be compressed and cached.[1][4][7]
zbud is a special-purpose memory allocator used internally by zswap for storing compressed pages, implemented as a rewrite of the zbud allocator used by the Oracle's zcache,[8] which is another virtual memory compression implementation for the Linux kernel. Internally, zbud works by storing up to two compressed pages ("buddies", hence the allocator name) per physical memory page, which brings both advantages due to easy coalescing and reusing of freed space, and disadvantages due to possible lower memory utilization. However, as a result of its design, zbud cannot allocate more memory space than it would be originally occupied by the uncompressed pages.[3][9]
356178063d