Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: lightly loaded system eats swap space

221 views
Skip to first unread message

Adam

unread,
Jun 17, 2018, 7:08:48 PM6/17/18
to
On Sun, Jun 17, 2018 at 5:19 PM, tech-lists <tech-...@zyxst.net> wrote:

> Hello list,
>
> context is (server)
> freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM, E5-2630
> @2.3GHz, generic kernel.
>
> There's one bhyve guest on this server (using 4x cpu and 16GB RAM, also
> freebsd-11-stable)
>
> There have been no special options for zfs configuration on the server,
> apart from several datasets having the compressed property set (lz4).
>
> The server runs nothing else really apart from sshd and it uses ntpd to
> sync local time.
>
> How come such a lightly loaded server with plenty of resources is eating
> up swap? If I run two bhyve instances, i.e. two of the same size as
> indicated above, so 32GB used for the bhyves, I'll get out-of-swapspace
> errors in the daily logs:
>
> +swap_pager_getswapspace(24): failed
> +swap_pager_getswapspace(24): failed
> +swap_pager_getswapspace(24): failed
>
> Here's top, with one bhyve instance running:
>
> last pid: 49494; load averages: 0.12, 0.13, 0.88
>
> up 29+11:36:06 22:52:45
> 54 processes: 1 running, 53 sleeping
> CPU: 0.4% user, 0.0% nice, 0.4% system, 0.3% interrupt, 98.9% idle
> Mem: 8664K Active, 52M Inact, 4797M Laundry, 116G Wired, 1391M Buf, 4123M
> Free
> ARC: 108G Total, 1653M MFU, 105G MRU, 32K Anon, 382M Header, 632M Other
> 103G Compressed, 104G Uncompressed, 1.00:1 Ratio
> Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> COMMAND
> 49491 root 1 4 0 16444K 12024K select 9 0:12 6.49% ssh
> 32868 root 12 20 0 9241M 4038M kqread 2 23.2H 1.30% bhyve
> 49490 root 1 20 0 10812K 6192K sbwait 5 0:02 0.88% sftp
>
> From the looks of it, a huge amount of ram is wired. Why is that, and how
> would I debug it?
>

That seems to be shown in the output you provided:
ARC: 108G Total, 1653M MFU, 105G MRU, 32K Anon, 382M Header, 632M Other


>
> A server of similar spec which is running freebsd-current with seven bhyve
> instances doesn't have this issue:
>

Based upon the output neither ram nor swap seems like similar spec so I
wonder if you could say what you mean by that.

--
Adam
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

tech-lists

unread,
Jun 18, 2018, 8:07:29 AM6/18/18
to
On 18/06/2018 00:04, Adam wrote:
> Based upon the output neither ram nor swap seems like similar spec so I
> wonder if you could say what you mean by that.

server with the problem:

last pid: 62387; load averages: 0.07, 0.10, 0.08


up 30+01:30:01 12:46:40
48 processes: 1 running, 47 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 5404K Active, 1218M Inact, 4927M Laundry, 117G Wired, 1392M Buf,
2116M Free
ARC: 109G Total, 2982M MFU, 105G MRU, 288K Anon, 385M Header, 638M Other
104G Compressed, 105G Uncompressed, 1.01:1 Ratio
Swap: 4096M Total, 3383M Used, 713M Free, 82% Inuse

This server runs 1 bhyve instance using 16GB. It has 128GB RAM.

server without a problem:

last pid: 84491; load averages: 0.03, 0.02, 0.02


up 17+14:39:31 12:47:33
27 processes: 1 running, 26 sleeping
CPU: % user, % nice, % system, % interrupt, % idle
Mem: 2915M Active, 25G Inact, 9947M Laundry, 28G Wired, 1572M Buf, 59G Free
ARC: 22G Total, 3379M MFU, 19G MRU, 781K Anon, 64M Header, 974K Other
22G Compressed, 29G Uncompressed, 1.33:1 Ratio, 121M Overhead
Swap: 35G Total, 3747M Used, 32G Free, 10% Inuse

This server runs 10 bhyve instances of various RAM sizes, from 4GB to
32GB. It also has 128GB RAM installed. This is what I mean by "similar
spec", I mean in terms of RAM.

This server doesn't usually show swap as being in use, but I can
actually account for this, as there is memory overcommitment here.

My point is that 'server with problem' should not show a resources issue
with just one bhyve instance using 16GB RAM but apparently it is, and I
don't know why. I am starting the VMs by hand, without any third-party
application, and I'm not wiring memory on the command line.

thanks,
--
J.

David Fullard

unread,
Jun 18, 2018, 8:28:35 AM6/18/18
to
I've noticed you've got a rather large ZFS ARC. You could try limiting the ZFS max ARC size by setting the vfs.zfs.arc_max sysctl.
> A server of similar spec which is running freebsd-current with seven
> bhyve instances doesn't have this issue:
>
> last pid: 41904; load averages: 0.26, 0.19, 0.15
>
>
> up 17+01:06:11 23:14:13
> 27 processes: 1 running, 26 sleeping
> CPU: 0.1% user, 0.0% nice, 0.3% system, 0.0% interrupt, 99.6% idle
> Mem: 17G Active, 6951M Inact, 41G Laundry, 59G Wired, 1573M Buf, 1315M Free
> ARC: 53G Total, 700M MFU, 52G MRU, 512K Anon, 182M Header, 958K Other
> 53G Compressed, 69G Uncompressed, 1.30:1 Ratio, 122M Overhead
> Swap: 35G Total, 2163M Used, 33G Free, 6% Inuse

tech-lists

unread,
Jun 18, 2018, 8:32:13 AM6/18/18
to
On 18/06/2018 09:08, Erich Dollansky wrote:
> Hi,
>
> On Sun, 17 Jun 2018 23:19:02 +0100
> tech-lists <tech-...@zyxst.net> wrote:
>
>> freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
>> Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
>
> this might not be related but I noticed that your swap space is small
> compared to RAM size. I noticed on a much smaller Raspberry Pi, that it
> runs into trouble when there is no swap even there is enough RAM
> available. Is it easily possible for you to add some GB of swap space
> and let the machine run then?
>
> How much swap do the other machines have?

Hi,

Yes, the machine with the problem uses the default 4GB swap. That's all
the swap it has. The machine without issue has a swapfile installed on a
SSD in addition to the default 4GB swap.

problematic machine:
Device 512-blocks Used Avail Capacity
/dev/ada0p3 8388608 3.3G 714M 83%

machine without a problem, it has swapfile installed:
Device 512-blocks Used Avail Capacity
/dev/ada0s1b 8262248 1.7G 2.2G 44%
/dev/md0 65536000 1.9G 29G 6%
Total 73798248 3.7G 32G 10%

I added the swapfile a long time ago on this machine due to the same issue.

But my problem isn't so much an out of swapspace problem; all this is,
is a symptom. My problem is "why is it swapping out at all on a 128GB
system and why is what's swapped out not being swapped back in again".

tech-lists

unread,
Jun 18, 2018, 9:03:40 AM6/18/18
to
On 18/06/2018 13:23, David Fullard wrote:
> I've noticed you've got a rather large ZFS ARC. You could try
> limiting the ZFS max ARC size by setting the vfs.zfs.arc_max sysctl.

I'll try this as soon as I can.

tech-lists

unread,
Jun 18, 2018, 9:14:45 AM6/18/18
to
On 18/06/2018 13:45, Adam wrote:
> What is the output of sysctl vm.overcommit?

That's a sysctl I've not heard of. It's 0 on both machines

> If this system is intended on being a VM host, then why don't you
> limit ARC to something reasonable like Total Mem - Projected VM Mem -
> Overhead = Ideal ARC .

I'll try doing that when I can take the machine out of service.

Erich Dollansky

unread,
Jun 18, 2018, 9:53:00 PM6/18/18
to
Hi,

On Mon, 18 Jun 2018 13:27:23 +0100
tech-lists <tech-...@zyxst.net> wrote:

> On 18/06/2018 09:08, Erich Dollansky wrote:
> >
> > On Sun, 17 Jun 2018 23:19:02 +0100
> > tech-lists <tech-...@zyxst.net> wrote:
> >
> >> freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
> >> Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
> >
> > this might not be related but I noticed that your swap space is
> > small compared to RAM size. I noticed on a much smaller Raspberry
> > Pi, that it runs into trouble when there is no swap even there is
> > enough RAM available. Is it easily possible for you to add some GB
> > of swap space and let the machine run then?
> >
> > How much swap do the other machines have?
>
> Yes, the machine with the problem uses the default 4GB swap. That's
> all the swap it has. The machine without issue has a swapfile
> installed on a SSD in addition to the default 4GB swap.
>
> problematic machine:
> Device 512-blocks Used Avail Capacity
> /dev/ada0p3 8388608 3.3G 714M 83%
>
> machine without a problem, it has swapfile installed:
> Device 512-blocks Used Avail Capacity
> /dev/ada0s1b 8262248 1.7G 2.2G 44%
> /dev/md0 65536000 1.9G 29G 6%
> Total 73798248 3.7G 32G 10%
>
> I added the swapfile a long time ago on this machine due to the same
> issue.

so, the same effect as on a small Raspberry.

It seems that you also use a memory disk for swap. Mine is backed by a
file via NFS.

>
> But my problem isn't so much an out of swapspace problem; all this
> is, is a symptom. My problem is "why is it swapping out at all on a
> 128GB system and why is what's swapped out not being swapped back in
> again".
>
I wondered even on the small Raspberry about this. The Raspberries come
with 1GB of RAM. Running just a compilation should never be the problem
but sometimes it is.

A very long time ago - and not on FreeBSD but maybe on a real BSD - I
worked with a system that swapped pages out just to bring it back as
one contiguous block. This made a difference those days. I do not know
if the code made it out of the university I was working at. I just
imagine now that the code made it out and is still in use with the
opposite effect.

Erich

Jeremy Chadwick

unread,
Jun 19, 2018, 1:33:55 PM6/19/18
to
(I am not subscribed to -stable, so please CC me, though I doubt I can
help in any way/shape/form past this Email)

Not the first time this has come up -- and every time it has, all that's
heard is crickets in the threads. Recent proof:

https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088727.html
https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088728.html
https://lists.freebsd.org/pipermail/freebsd-stable/2018-June/089094.html

I sent private mail to Peter Jeremy about his issue. I will not
disclose that Email here. However, I will disclose the commits I
included in said Email that have touched ZFS ARC-related code:

http://www.freshbsd.org/commit/freebsd/r332785
http://www.freshbsd.org/commit/freebsd/r332552
http://www.freshbsd.org/commit/freebsd/r332540 (may help give insights)
http://www.freshbsd.org/commit/freebsd/r330061
http://www.freshbsd.org/commit/freebsd/r328235
http://www.freshbsd.org/commit/freebsd/r327491
http://www.freshbsd.org/commit/freebsd/r326619
http://www.freshbsd.org/commit/freebsd/r326427 (quota-related, maybe irrelevant)
http://www.freshbsd.org/commit/freebsd/r323667

In short (and nebulous as hell; sorry, I cannot be more specific given
the nature of the problem): there have been changes about ZFS's memory
allocation/releasing decision-making scheme compared to ZFS on "older"
FreeBSD (i.e. earlier 11.x, and definitely 10.x and 9.x).

Recommendations like "limit your ARC" are nothing new in FreeBSD, but
are still ridiculous kludges: tech-lists' system clearly has 105GB MRU
(MRU = most recently used) in ARC, meaning there is memory that can be
released back to the rest of the OS for general use (re: memory
contention/pressure situation), but the OS is choosing to use swap
instead, eventually exhausting it. That logic sounds broken, IMO. (And
yes I did notice the size of bhyve process)

ZFS-related kernel folks need to be involved in this conversation. For
whatever reason, in the past several years, related committers are no
longer participating in these type of discussions. The opposite was
true back in the 7.x to 9.x days. The answers have to come from them.
I don't know, today, a) how they prefer these problems get reported to
them, or b) what exact information they want that can help narrow it
down (tech-lists' provided data is, IMO, good and par for the course).

--
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

Cassiano Peixoto

unread,
Jun 19, 2018, 2:02:08 PM6/19/18
to
Hi,

I have the very same issue on many servers. Mainly on mail servers running
qmail+spamassassin+clamav. I've tuned some variables on loader.conf:

vfs.zfs.vdev.cache.size="2G"
vfs.zfs.arc_min="614400000"
vfs.zfs.arc_max="4915200000"

But after some days, it begins eating swap and the system comes very very
slow then I need to reboot it.

My system config is:

FreeBSD 11.1-STABLE #5 r321625M: Thu Sep 21 16:01:56 -03 2017
ro...@mail.com:/usr/obj/usr/src/sys/MAIL amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM
4.0.0)
VT(vga): resolution 640x480
CPU: Intel(R) Atom(TM) CPU C2518 @ 1.74GHz (1750.04-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x101<LAHF,Prefetch>
Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 8589934592 (8192 MB)
avail memory = 8213245952 (7832 MB)

It's configured with 4GB of swap. Let me know if I can help with any other
information.

Thanks.

Paul van der Zwan

unread,
Jun 19, 2018, 3:33:58 PM6/19/18
to
Hi
I had something similar on FreeNAS 11.1 ( based on FreeBSD 11.1).
Swap was allocated and never released until it ran out.
Some time ago I set the following sysctl:
vm.disable_swapspace_pageouts: 1

That completely stopped swap allocation and I have not rebooted since, except for OS patching.

From what I found using google this sysctl may have some nasty side effects when system runs out of memory,
but that has not happened on my system.

Paul

Volodymyr Kostyrko

unread,
Jun 19, 2018, 4:24:46 PM6/19/18
to
19.06.18 20:29, Jeremy Chadwick wrote:
> (I am not subscribed to -stable, so please CC me, though I doubt I can
> help in any way/shape/form past this Email)
>
> Not the first time this has come up -- and every time it has, all that's
> heard is crickets in the threads. Recent proof:

I may sound lame but I also faced this issues a few month ago. After a
few days of load system was trying to push more and more data into the
swap up to the point that active window becomes too small to handle all
programs so they should be get back from swap and while they are not
running something pinches ARC and ARC claims the memory and so on…

I wasn't ever a fan of limiting things. If something requires limits it
can be easily exploited. Should I switch from normal limits to other
limits when I, say, need to test something on 4 VMs?

So while paging through documentation I found a rather old memo in
tuning(7) about vm.swap_idle_enabled. I played a little with thresholds
but that was only making things worse. I left swap_idle_enable on and
let machine live. That was near January I suppose. To my amusement swap
problems were gone. This doesn't mean swap wasn't used, instead system
survived weeks under irregular load without issues. The only other
change that I did was bumping up vfs.zfs.arc_free_target a little bit
higher then default to make some space between ARC and VM so they
wouldn't clash on memory so often.

Since then all of my problems with swap was forgotten. I'm not sure what
setting fixed that, neither I'm sure that wasn't some recent patches.
I'm running 11-STABLE and rebuilding system at least once per month.

Hope that can help someone. WBR.

--
Sphinx of black quartz judge my vow.

Shane Ambler

unread,
Jun 20, 2018, 1:13:24 AM6/20/18
to
On 20/06/2018 02:59, Jeremy Chadwick wrote:

> In short (and nebulous as hell; sorry, I cannot be more specific given
> the nature of the problem): there have been changes about ZFS's memory
> allocation/releasing decision-making scheme compared to ZFS on "older"
> FreeBSD (i.e. earlier 11.x, and definitely 10.x and 9.x).

I would say the issues started with 10.x, I never had memory issues
using 9.x with ZFS. I first had all ram marked wired when testing 10.1.

> Recommendations like "limit your ARC" are nothing new in FreeBSD, but
> are still ridiculous kludges: tech-lists' system clearly has 105GB MRU
> (MRU = most recently used) in ARC, meaning there is memory that can be
> released back to the rest of the OS for general use (re: memory
> contention/pressure situation), but the OS is choosing to use swap
> instead, eventually exhausting it. That logic sounds broken, IMO. (And
> yes I did notice the size of bhyve process)

This review is aiming to fix this -
https://reviews.freebsd.org/D7538

I have been running the patch on stable/11 and after eight days uptime I
still have zero swap in use, I can't recall a time in the last few years
that I have had no swap usage past the first hour or two uptime.

As I have commented in that review, the issue I am seeing is that
arc_max is not counted when testing max_wired, the two together can add
up to more than the physical ram and wiring all physical ram can push
ram used by processes out to swap. I know with 8G physical ram having
over 7G wired is not recoverable.

max_wired seems to default to 30% ram (5G with 16G ram) I have never
seen this mentioned in any zfs tuning, it should be subtracted from
physical ram when calculating arc_max.

arc_max should never be greater than (kmem_size - max_wired - padding)

--
FreeBSD - the place to B...Storing Data

Shane Ambler
0 new messages