Chasing OOM Issues - good sysctl metrics to use?

400 views
Skip to first unread message

Pete Wright

unread,
Apr 21, 2022, 10:18:38 PM4/21/22
to FreeBSD Current
hello -

on my workstation running CURRENT (amd64/32g of ram) i've been running
into a scenario where after 4 or 5 days of daily use I get an OOM event
and both chromium and firefox are killed.  then in the next day or so
the system will become very unresponsive in the morning when i unlock my
screensaver in the morning forcing a manual power cycle.

one thing i've noticed is growing swap usage but plenty of free and
inactive memory as well as a GB or so of memory in the Laundry state
according top.  my understanding is that seeing swap usage grow over
time is expected and doesn't necessarily indicate a problem.  but what
concerns me is the system locking up while seeing quite a bit of disk
i/o (maybe from paging back in?).

in order to help chase this down i've setup the
prometheus_sysctl_exporter(8) to send data to a local prometheus
instance.  the goal is to examine memory utilizaton over time to help
detect any issues. so my question is this:

what OID's would be useful to help see to help diagnose weird memory
issues like this?

i'm currently looking at:
sysctl_vm_domain_0_stats_laundry
sysctl_vm_domain_0_stats_active
sysctl_vm_domain_0_stats_free_count
sysctl_vm_domain_0_stats_inactive_pps


thanks in advance - and i'd be happy to share my data if anyone is
interested :)
-pete

--
Pete Wright
pe...@nomadlogic.org
@nomadlogicLA


Mark Millard

unread,
Apr 22, 2022, 12:21:03 AM4/22/22
to pe...@nomadlogic.org, freebsd-current
Pete Wright <pete_at_nomadlogic.org> wrote on
Date: Thu, 21 Apr 2022 19:16:42 -0700 :
Messages in the console out would be appropriate
to report. Messages might also be available via
the following at appropriate times:

# dmesg -a
. . .

or:

# more /var/log/messages
. . .

Generally messages from after the boot is complete
are more relevant.


Messages like the following are some examples
that would be of interest:

pid . . .(c++), jid . . ., uid . . ., was killed: failed to reclaim memory
pid . . .(c++), jid . . ., uid . . ., was killed: a thread waited too long to allocate a page
pid . . .(c++), jid . . ., uid . . ., was killed: out of swap space

(That last is somewhat of a misnomer for the internal
issue that leads to it.)

I'm hoping you got message(s) of one or more of the above
kinds. But others are also relevant:

. . . kernel: swap_pager: out of swap space
. . . kernel: swp_pager_getswapspace(7): failed

. . . kernel: swap_pager: indefinite wait buffer: bufobj: . . ., blkno: . . ., size: . . .

(Those messages do not announce a process kill but
give some evidence about context.)

Some of the messages with part of the text matching
actually identify somewhat different contexts --so
each message type is relevant.

There may be other types of messages that are relevant.

The sequencing of the messages could be relevant.

Do you have any swap partitions set up and in use? The
details could be relevant. Do you have swap set up
some other way than via swap partition use? No swap?

If 1+ swap partitions are in use, things that suggest
the speeds/latency characteristics of the I/O to the
drive could be relevant.

ZFS (so with ARC)? UFS? Both?

The first block of lines from a top display could be
relevant, particularly when it is clearly progressing
towards having the problem. (After the problem is too
late.) (I just picked top as a way to get a bunch of
the information all together automatically.)

These sorts of things might help folks help you.

===
Mark Millard
marklmi at yahoo.com


tech-lists

unread,
Apr 22, 2022, 4:41:21 PM4/22/22
to freebsd...@freebsd.org
Hi,

On Thu, Apr 21, 2022 at 07:16:42PM -0700, Pete Wright wrote:
>hello -
>
>on my workstation running CURRENT (amd64/32g of ram) i've been running
>into a scenario where after 4 or 5 days of daily use I get an OOM event
>and both chromium and firefox are killed.  then in the next day or so
>the system will become very unresponsive in the morning when i unlock my
>screensaver in the morning forcing a manual power cycle.

I have the following set in /etc/sysctl.conf on a stable/13 workstation.
Am using zfs with 32GB RAM.

vm.pageout_oom_seq=120
vm.pfault_oom_attempts=-1
vm.pageout_update_period=0

Since setting these here, OOM is a rarity. I don't profess to exactly know
what they do in detail though. But my experience since these were set
is hardly any OOM and big users of memory like firefox don't crash.
--
J.
signature.asc

Pete Wright

unread,
Apr 22, 2022, 7:44:28 PM4/22/22
to Mark Millard, freebsd-current


On 4/21/22 21:18, Mark Millard wrote:
>
> Messages in the console out would be appropriate
> to report. Messages might also be available via
> the following at appropriate times:

that is what is frustrating.  i will get notification that the processes
are killed:
Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was
killed: failed to reclaim memory
Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was
killed: failed to reclaim memory
Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001,
was killed: failed to reclaim memory
Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001,
was killed: failed to reclaim memory
Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001,
was killed: failed to reclaim memory
Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was
killed: failed to reclaim memory
Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001,
was killed: failed to reclaim memory

the system in this case had killed both firefox and chrome while i was
afk.  i logged back in and started them up to do more more, then the
next logline is from this morning when i had to force power off/on the
system as they keyboard and network were both unresponsive:

Apr 22 09:58:20 topanga syslogd: kernel boot file is /boot/kernel/kernel

> Do you have any swap partitions set up and in use? The
> details could be relevant. Do you have swap set up
> some other way than via swap partition use? No swap?
yes i have a 2GB of swap that resides on a nvme device.
> ZFS (so with ARC)? UFS? Both?

i am using ZFS and am setting my vfs.zfs.arc.max to 10G.  i have also
experienced this crash with that set to the default unlimited value as well.

>
> The first block of lines from a top display could be
> relevant, particularly when it is clearly progressing
> towards having the problem. (After the problem is too
> late.) (I just picked top as a way to get a bunch of
> the information all together automatically.)

since the initial OOM events happen when i am AFK it is difficult to get
relevant stats out of top.

this is why i've started collecting more detailed metrics in
prometheus.  my hope is i'll be able to do a better job observing how my
system is behaving over time, in the run up to the OOM event as well as
right before and after.  there are heaps of metrics collected though so
hoping someone can point me in the right direction :)

Pete Wright

unread,
Apr 22, 2022, 7:48:08 PM4/22/22
to freebsd...@freebsd.org
nice, i will give those a test next time i crash which will be by next
thurs if the pattern continues.

looking at the sysctl descriptions:
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault
handler before it triggers OOM handling
vm.pageout_update_period: Maximum active LRU update period

i could certainly see how those could be helpful.  in an ideal world i'd
find the root cause of the system lock-ups, but it would be nice to just
move on from this :)

cheers,
-p

Mark Millard

unread,
Apr 22, 2022, 9:49:02 PM4/22/22
to Pete Wright, freebsd-current
On 2022-Apr-22, at 16:42, Pete Wright <pe...@nomadlogic.org> wrote:

> On 4/21/22 21:18, Mark Millard wrote:
>>
>> Messages in the console out would be appropriate
>> to report. Messages might also be available via
>> the following at appropriate times:
>
> that is what is frustrating. i will get notification that the processes are killed:
> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory
> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory

Those messages are not reporting being out of swap
as such. They are reporting sustained low free RAM
despite a number of less drastic attempts to gain
back free RAM (to above some threshold).

FreeBSD does not swap out the kernel stacks for
processes that stay in a runnable state: it just
continues to page. Thus just one large process
that has a huge working set of active pages can
lead to OOM kills in a context were no other set
of processes would be enough to gain the free
RAM required. Such contexts are not really a
swap issue.

Based on there being only 1 "killed:" reason,
I have a suggestion that should allow delaying
such kills for a long time. That in turn may
help with investigating without actually
suffering the kills during the activity: more
time with low free RAM to observe.

Increase:

# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM

The default value was 12, last I checked.

My /boot/loader.conf contains the following relative to
that and another type of kill context (just comments
currently for that other type):

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

There is no value of vm.pageout_oom_seq that
disables the mechanism. But you can set large
values, like I did --or even larger-- to
wait for more attempts to free some RAM before
the kills. Some notes about that follow.

The 120 I use allows even low end arm Small
Board Computers to manage buildworld buildkernel
without such kills. The buildworld buildkernel
completion is sufficient that the low-free-RAM
status is no longer true and the OOM attempts
stop --so the count goes back to 0.

But those are large but finite activities. If
you want to leave something running for days,
weeks, months, or whatever that produces the
sustained low free RAM conditions, the problem
will eventually happen. Ultimately one may have
to exit and restart such processes once and a
while, exiting enough of them to give a little
time with sufficient free RAM.


> the system in this case had killed both firefox and chrome while i was afk. i logged back in and started them up to do more more, then the next logline is from this morning when i had to force power off/on the system as they keyboard and network were both unresponsive:
>
> Apr 22 09:58:20 topanga syslogd: kernel boot file is /boot/kernel/kernel
>
>> Do you have any swap partitions set up and in use? The
>> details could be relevant. Do you have swap set up
>> some other way than via swap partition use? No swap?
> yes i have a 2GB of swap that resides on a nvme device.

I assume a partition style. Otherwise there are other
issues involved --that likely should be avoided by
switching to partition style.

>> ZFS (so with ARC)? UFS? Both?
>
> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well.

I use ZFS on systems with at least 8 GiBytes of RAM,
but I've never tuned ZFS. So I'm not much help for
that side of things.

For systems with under 8 GiBytes of RAM, I uses UFS
unless doing an odd experiment.

>> The first block of lines from a top display could be
>> relevant, particularly when it is clearly progressing
>> towards having the problem. (After the problem is too
>> late.) (I just picked top as a way to get a bunch of
>> the information all together automatically.)
>
> since the initial OOM events happen when i am AFK it is difficult to get relevant stats out of top.

If you use vm.pageout_oom_seq=120 (or more) and check once
and a while, I expect you would end up seeing the activity
in top without suffering a kill in short order. Once noticed,
you could start your investigation, including via top if
desired.

> this is why i've started collecting more detailed metrics in prometheus. my hope is i'll be able to do a better job observing how my system is behaving over time, in the run up to the OOM event as well as right before and after. there are heaps of metrics collected though so hoping someone can point me in the right direction :)

I'm hoping that vm.pageout_oom_seq=120 (or more) makes it
so you do not have to have identified everything up front
and can explore easier.


Note that vm.pageout_oom_seq is both a loader tunable
and a writeable runtime tunable:

# sysctl -T vm.pageout_oom_seq
vm.pageout_oom_seq: 120
amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq
vm.pageout_oom_seq: 120

So you can use it to extend the time when the
machine is already running.

Jeffrey Bouquet

unread,
Apr 23, 2022, 12:37:48 PM4/23/22
to Pete Wright, freebsd-current
I don't know if it would help this discussion, but I installed security/picocrypt
and memory usage started going thru the roof*. Maybe that port could be used to
test any fix found in this thread, iow if that port runs better without /with some
sysctl... if could be used for the issues discussed here so far.
* and I had to desintall. FWIW I had set the three vm listed above a short
while prior.

Pete Wright

unread,
Apr 23, 2022, 1:28:11 PM4/23/22
to Mark Millard, freebsd-current
Thank you for this clarification/explanation - that totally makes sense!

>
> Based on there being only 1 "killed:" reason,
> I have a suggestion that should allow delaying
> such kills for a long time. That in turn may
> help with investigating without actually
> suffering the kills during the activity: more
> time with low free RAM to observe.

Great idea thank-you!  and thanks for the example settings and
descriptions as well.
> But those are large but finite activities. If
> you want to leave something running for days,
> weeks, months, or whatever that produces the
> sustained low free RAM conditions, the problem
> will eventually happen. Ultimately one may have
> to exit and restart such processes once and a
> while, exiting enough of them to give a little
> time with sufficient free RAM.
perfect - since this is a workstation my run-time for these processes is
probably a week as i update my system and pkgs over the weekend, then
dog food current during the work week.

>> yes i have a 2GB of swap that resides on a nvme device.
> I assume a partition style. Otherwise there are other
> issues involved --that likely should be avoided by
> switching to partition style.

so i kinda lied - initially i had just a 2G swap, but i added a second
20G swap a while ago to have enough space to capture some cores while
testing drm-kmod work.  based on this comment i am going to only use the
20G file backed swap and see how that goes.

this is my fstab entry currently for the file backed swap:
md99 none swap sw,file=/root/swap1,late 0 0


>
>>> ZFS (so with ARC)? UFS? Both?
>> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well.
> I use ZFS on systems with at least 8 GiBytes of RAM,
> but I've never tuned ZFS. So I'm not much help for
> that side of things.

since we started this thread I've gone ahead and removed the zfs.arc.max
setting since its cruft at this point.  i initially added it to test a
configuration i deployed to a sever hosting a bunch of VMs.

> I'm hoping that vm.pageout_oom_seq=120 (or more) makes it
> so you do not have to have identified everything up front
> and can explore easier.
>
>
> Note that vm.pageout_oom_seq is both a loader tunable
> and a writeable runtime tunable:
>
> # sysctl -T vm.pageout_oom_seq
> vm.pageout_oom_seq: 120
> amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq
> vm.pageout_oom_seq: 120
>
> So you can use it to extend the time when the
> machine is already running.

fantastic.  thanks again for taking your time and sharing your knowledge
and experience with me Mark!

these types of journeys are why i run current on my daily driver, it
really helps me better understand the OS so that i can be a better admin
on the "real" servers i run for work.  its also just fun to learn stuff
too heh.

-p

Mark Millard

unread,
Apr 23, 2022, 3:34:43 PM4/23/22
to Pete Wright, freebsd-current
I think you may have taken my suggestion backwards . . .

Unfortunately, vnode (file) based swap space should be *avoided*
and partitions are what should be used in order to avoid deadlocks:

On 2017-Feb-13, at 7:20 PM, Konstantin Belousov <kostikbel at gmail.com> wrote
on the freebsd-arm list:

QUOTE
swapfile write requires the write request to come through the filesystem
write path, which might require the filesystem to allocate more memory
and read some data. E.g. it is known that any ZFS write request
allocates memory, and that write request on large UFS file might require
allocating and reading an indirect block buffer to find the block number
of the written block, if the indirect block was not yet read.

As result, swapfile swapping is more prone to the trivial and unavoidable
deadlocks where the pagedaemon thread, which produces free memory, needs
more free memory to make a progress. Swap write on the raw partition over
simple partitioning scheme directly over HBA are usually safe, while e.g.
zfs over geli over umass is the worst construction.
END QUOTE

The developers handbook has a section debugging deadlocks that he
referenced in a response to another report (on freebsd-hackers).

https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks

>>>> ZFS (so with ARC)? UFS? Both?
>>> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well.
>> I use ZFS on systems with at least 8 GiBytes of RAM,
>> but I've never tuned ZFS. So I'm not much help for
>> that side of things.
>
> since we started this thread I've gone ahead and removed the zfs.arc.max setting since its cruft at this point. i initially added it to test a configuration i deployed to a sever hosting a bunch of VMs.
>
>> I'm hoping that vm.pageout_oom_seq=120 (or more) makes it
>> so you do not have to have identified everything up front
>> and can explore easier.
>>
>>
>> Note that vm.pageout_oom_seq is both a loader tunable
>> and a writeable runtime tunable:
>>
>> # sysctl -T vm.pageout_oom_seq
>> vm.pageout_oom_seq: 120
>> amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq
>> vm.pageout_oom_seq: 120
>>
>> So you can use it to extend the time when the
>> machine is already running.
>
> fantastic. thanks again for taking your time and sharing your knowledge and experience with me Mark!
>
> these types of journeys are why i run current on my daily driver, it really helps me better understand the OS so that i can be a better admin on the "real" servers i run for work. its also just fun to learn stuff too heh.
>


Pete Wright

unread,
Apr 23, 2022, 10:22:10 PM4/23/22
to Mark Millard, freebsd-current


On 4/23/22 12:31, Mark Millard wrote:
> I think you may have taken my suggestion backwards . . .
>
> Unfortunately, vnode (file) based swap space should be *avoided*
> and partitions are what should be used in order to avoid deadlocks:
>
> On 2017-Feb-13, at 7:20 PM, Konstantin Belousov <kostikbel at gmail.com> wrote
> on the freebsd-arm list:
>
> QUOTE
> swapfile write requires the write request to come through the filesystem
> write path, which might require the filesystem to allocate more memory
> and read some data. E.g. it is known that any ZFS write request
> allocates memory, and that write request on large UFS file might require
> allocating and reading an indirect block buffer to find the block number
> of the written block, if the indirect block was not yet read.
>
> As result, swapfile swapping is more prone to the trivial and unavoidable
> deadlocks where the pagedaemon thread, which produces free memory, needs
> more free memory to make a progress. Swap write on the raw partition over
> simple partitioning scheme directly over HBA are usually safe, while e.g.
> zfs over geli over umass is the worst construction.
> END QUOTE
>
> The developers handbook has a section debugging deadlocks that he
> referenced in a response to another report (on freebsd-hackers).
>
> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks

d'oh - thanks for the correction!

-pete

Michael Jung

unread,
Apr 24, 2022, 12:15:59 AM4/24/22
to Mark Millard, Pete Wright, freebsd-current

Hi:

 

I guess I’m kind of high jacking this thread but I think what I’m

going to ask is close enough to the topic at hand to ask instead

of starting a new thread and referencing this one.

 

I use sysutils/monit to what running processes and then restart them

I as I want.

 

I use  protect(1) to make sure that monit would not dies.

 

In /etc/rc.local “protect –i monit”

 

protect seems in the end simply call

 

     PROC_SPROTECT with the INHERIT flag and as documented in procctl(2)

 

So I followed a bit of code I guess that cools if I got it right but I know

about .0001% about system internals.

 

Can anyone speak to how protect(1) works and is it in itself is prone to what has been discussed?

 

For my use case is protect “good enough” or do I really need to tuning like has been talked about?

 

If protect is the right answer and someone could explain how it does

Its thing at a slighter higher technical barrier I would love to hear

more about why I’m either doing it wrong, that that what I’m doing it ok, or

why I should really be doing something completely different and the why I

should be doing it differently.

 

I suspect there are many that would like to know this but would never ask,

at least not on list.

 

Always the seeker of new knowledge.

 

Thanks in advance.

 

--mikej

 

 

 

CONFIDENTIALITY NOTE: This message is intended only for the use
of the individual or entity to whom it is addressed and may
contain information that is privileged, confidential, and
exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby
notified that any dissemination, distribution or copying
of this communication is strictly prohibited. If you have
received this transmission in error, please notify us by
telephone at (502) 212-4000 or notify us at: PAI, Dept. 99,
2101 High Wickham Place, Suite 101, Louisville, KY 40245





Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast, a leader in email security and cyber resilience. Mimecast integrates email defenses with brand protection, security awareness training, web security, compliance and other essential capabilities. Mimecast helps protect large and small organizations from malicious activity, human error and technology failure; and to lead the movement toward building a more resilient world. To find out more, visit our website.

Pete Wright

unread,
Apr 29, 2022, 2:10:08 PM4/29/22
to Mark Millard, freebsd-current


On 4/23/22 19:20, Pete Wright wrote:
>
>> The developers handbook has a section debugging deadlocks that he
>> referenced in a response to another report (on freebsd-hackers).
>>
>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
>>
>
> d'oh - thanks for the correction!
>
> -pete
>
>

hello, i just wanted to provide an update on this issue.  so the good
news is that by removing the file backed swap the deadlocks have indeed
gone away!  thanks for sorting me out on that front Mark!

i still am seeing a memory leak with either firefox or chrome (maybe
both where they create a voltron of memory leaks?).  this morning
firefox and chrome had been killed when i first logged in. fortunately
the system has remained responsive for several hours which was not the
case previously.

when looking at my metrics i see vm.domain.0.stats.inactive take a nose
dive from around 9GB to 0 over the course of 1min.  the timing seems to
align with around the time when firefox crashed, and is proceeded by a
large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before
the apps crashed.  after the binaries were killed memory metrics seem to
have recovered (laundry size grew, and inactive size grew by several
gigs for example).


maybe i'll have to gather data and post it online for anyone who would
be interested in seeing this in graph form.  although, frankly i feel
like it's a browser problem which i can work around by running them in
jails with resource limits in place via rctl.

Mark Millard

unread,
Apr 29, 2022, 2:40:15 PM4/29/22
to Pete Wright, freebsd-current
On 2022-Apr-29, at 11:08, Pete Wright <pe...@nomadlogic.org> wrote:

> On 4/23/22 19:20, Pete Wright wrote:
>>
>>> The developers handbook has a section debugging deadlocks that he
>>> referenced in a response to another report (on freebsd-hackers).
>>>
>>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
>>
>> d'oh - thanks for the correction!
>>
>> -pete
>>
>>
>
> hello, i just wanted to provide an update on this issue. so the good news is that by removing the file backed swap the deadlocks have indeed gone away! thanks for sorting me out on that front Mark!

Glad it helped.

> i still am seeing a memory leak with either firefox or chrome (maybe both where they create a voltron of memory leaks?). this morning firefox and chrome had been killed when i first logged in. fortunately the system has remained responsive for several hours which was not the case previously.
>
> when looking at my metrics i see vm.domain.0.stats.inactive take a nose dive from around 9GB to 0 over the course of 1min. the timing seems to align with around the time when firefox crashed, and is proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before the apps crashed. after the binaries were killed memory metrics seem to have recovered (laundry size grew, and inactive size grew by several gigs for example).

Since the form of kill here is tied to sustained low free memory
("failed to reclaim memory"), you might want to report the
vm.domain.0.stats.free_count figures from various time frames as
well:

vm.domain.0.stats.free_count: Free pages

(It seems you are converting pages to byte counts in your report,
the units I'm not really worried about so long as they are
obvious.)

There are also figures possibly tied to the handling of the kill
activity but some being more like thresholds than usage figures,
such as:

vm.domain.0.stats.free_severe: Severe free pages
vm.domain.0.stats.free_min: Minimum free pages
vm.domain.0.stats.free_reserved: Reserved free pages
vm.domain.0.stats.free_target: Target free pages
vm.domain.0.stats.inactive_target: Target inactive pages

Also, what value were you using for:

vm.pageout_oom_seq

?

> maybe i'll have to gather data and post it online for anyone who would be interested in seeing this in graph form. although, frankly i feel like it's a browser problem which i can work around by running them in jails with resource limits in place via rctl.




Pete Wright

unread,
Apr 29, 2022, 4:43:08 PM4/29/22
to Mark Millard, freebsd-current


On 4/29/22 11:38, Mark Millard wrote:
> On 2022-Apr-29, at 11:08, Pete Wright <pe...@nomadlogic.org> wrote:
>
>> On 4/23/22 19:20, Pete Wright wrote:
>>>> The developers handbook has a section debugging deadlocks that he
>>>> referenced in a response to another report (on freebsd-hackers).
>>>>
>>>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
>>> d'oh - thanks for the correction!
>>>
>>> -pete
>>>
>>>
>> hello, i just wanted to provide an update on this issue. so the good news is that by removing the file backed swap the deadlocks have indeed gone away! thanks for sorting me out on that front Mark!
> Glad it helped.

d'oh - went out for lunch and workstation locked up.  i *knew* i
shouldn't have said anything lol.

>> i still am seeing a memory leak with either firefox or chrome (maybe both where they create a voltron of memory leaks?). this morning firefox and chrome had been killed when i first logged in. fortunately the system has remained responsive for several hours which was not the case previously.
>>
>> when looking at my metrics i see vm.domain.0.stats.inactive take a nose dive from around 9GB to 0 over the course of 1min. the timing seems to align with around the time when firefox crashed, and is proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before the apps crashed. after the binaries were killed memory metrics seem to have recovered (laundry size grew, and inactive size grew by several gigs for example).
> Since the form of kill here is tied to sustained low free memory
> ("failed to reclaim memory"), you might want to report the
> vm.domain.0.stats.free_count figures from various time frames as
> well:
>
> vm.domain.0.stats.free_count: Free pages
>
> (It seems you are converting pages to byte counts in your report,
> the units I'm not really worried about so long as they are
> obvious.)
>
> There are also figures possibly tied to the handling of the kill
> activity but some being more like thresholds than usage figures,
> such as:
>
> vm.domain.0.stats.free_severe: Severe free pages
> vm.domain.0.stats.free_min: Minimum free pages
> vm.domain.0.stats.free_reserved: Reserved free pages
> vm.domain.0.stats.free_target: Target free pages
> vm.domain.0.stats.inactive_target: Target inactive pages
ok thanks Mark, based on this input and the fact i did manage to lock up
my system, i'm going to get some metrics up on my website and share them
publicly when i have time.  i'll definitely take you input into account
when sharing this info.

>
> Also, what value were you using for:
>
> vm.pageout_oom_seq
$ sysctl vm.pageout_oom_seq
vm.pageout_oom_seq: 120
$

cheers,

Mark Millard

unread,
Apr 29, 2022, 5:05:37 PM4/29/22
to Pete Wright, freebsd-current
On 2022-Apr-29, at 13:41, Pete Wright <pe...@nomadlogic.org> wrote:
>
> On 4/29/22 11:38, Mark Millard wrote:
>> On 2022-Apr-29, at 11:08, Pete Wright <pe...@nomadlogic.org> wrote:
>>
>>> On 4/23/22 19:20, Pete Wright wrote:
>>>>> The developers handbook has a section debugging deadlocks that he
>>>>> referenced in a response to another report (on freebsd-hackers).
>>>>>
>>>>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks
>>>> d'oh - thanks for the correction!
>>>>
>>>> -pete
>>>>
>>>>
>>> hello, i just wanted to provide an update on this issue. so the good news is that by removing the file backed swap the deadlocks have indeed gone away! thanks for sorting me out on that front Mark!
>> Glad it helped.
>
> d'oh - went out for lunch and workstation locked up. i *knew* i shouldn't have said anything lol.

Any interesting console messages ( or dmesg -a or /var/log/messages )?

>>> i still am seeing a memory leak with either firefox or chrome (maybe both where they create a voltron of memory leaks?). this morning firefox and chrome had been killed when i first logged in. fortunately the system has remained responsive for several hours which was not the case previously.
>>>
>>> when looking at my metrics i see vm.domain.0.stats.inactive take a nose dive from around 9GB to 0 over the course of 1min. the timing seems to align with around the time when firefox crashed, and is proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before the apps crashed. after the binaries were killed memory metrics seem to have recovered (laundry size grew, and inactive size grew by several gigs for example).
>> Since the form of kill here is tied to sustained low free memory
>> ("failed to reclaim memory"), you might want to report the
>> vm.domain.0.stats.free_count figures from various time frames as
>> well:
>>
>> vm.domain.0.stats.free_count: Free pages
>>
>> (It seems you are converting pages to byte counts in your report,
>> the units I'm not really worried about so long as they are
>> obvious.)
>>
>> There are also figures possibly tied to the handling of the kill
>> activity but some being more like thresholds than usage figures,
>> such as:
>>
>> vm.domain.0.stats.free_severe: Severe free pages
>> vm.domain.0.stats.free_min: Minimum free pages
>> vm.domain.0.stats.free_reserved: Reserved free pages
>> vm.domain.0.stats.free_target: Target free pages
>> vm.domain.0.stats.inactive_target: Target inactive pages
> ok thanks Mark, based on this input and the fact i did manage to lock up my system, i'm going to get some metrics up on my website and share them publicly when i have time. i'll definitely take you input into account when sharing this info.
>
>>
>> Also, what value were you using for:
>>
>> vm.pageout_oom_seq
> $ sysctl vm.pageout_oom_seq
> vm.pageout_oom_seq: 120
> $

Without knowing vm.domain.0.stats.free_count it is hard to
tell, but you might try, say, sysctl vm.pageout_oom_seq=12000
in hopes of getting notably more time with the
vm.domain.0.stats.free_count staying small. That may give
you more time to notice the low free RAM (if you are checking
periodically, rather than just waiting for failure to make
it obvious).

Mark Millard

unread,
May 10, 2022, 4:04:06 AM5/10/22
to Pete Wright, freebsd-current
On 2022-Apr-29, at 13:57, Mark Millard <mar...@yahoo.com> wrote:

> On 2022-Apr-29, at 13:41, Pete Wright <pe...@nomadlogic.org> wrote:
>>
>>> . . .
>>
>> d'oh - went out for lunch and workstation locked up. i *knew* i shouldn't have said anything lol.
>
> Any interesting console messages ( or dmesg -a or /var/log/messages )?
>

I've been doing some testing of a patch by tijl at FreeBSD.org
and have reproduced both hang-ups (ZFS/ARC context) and kills
(UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim
memory", both with and without the patch. This is with only a
tiny fraction of the swap partition(s) enabled being put to
use. So far, the testing was deliberately with
vm.pageout_oom_seq=12 (the default value). My testing has been
with main [so: 14].

But I also learned how to avoid the hang-ups that I got --but
it costs making kills more likely/quicker, other things being
equal.

I discovered that the hang-ups that I got were from all the
processes that I interact with the system via ending up with
the process's kernel threads swapped out and were not being
swapped in. (including sshd, so no new ssh connections). In
some contexts I only had escaping into the kernel debugger
available, not even ^T would work. Other times ^T did work.

So, when I'm willing to risk kills in order to maintain
the ability to interact normally, I now use in
/etc/sysctl.conf :

vm.swap_enabled=0

This disables swapping out of process kernel stacks. It
is just with that option removedfor gaining free RAM, there
fewer options tried before a kill is initiated. It is not a
loader-time tunable but is writable, thus the
/etc/sysctl.conf placement.

Note that I get kills both for vm.swap_enabled=0 and for
vm.swap_enabled=1 . It is just what looks like a hangup
that I'm trying to control via using =0 .

For now, I view my use as experimental. It might require
adjusting my vm.pageout_oom_seq=120 usage.

I've yet to use protect to also prevent kills of processes
needed for the interactions ( see: man 1 protect ). Most
likely I'd try to protect enough to allow the console
interactions to avoid being killed.


For reference . . .

The type of testing is to use the likes of:

# stress -m 2 --vm-bytes ????M --vm-keep

and part of the time with grep activity also
running, such as:

# grep -r nfreed /usr/*-src/sys/ | more

for specific values where the * is. (I have
13_0R , 13_1R , 13S , and main .) Varying
the value leads to reading new material
instead of referencing buffered/cached
material from the prior grep(s).

The ???? is roughly set up so that the system
ends up about where its initial Free RAM is
used up, so near (above or below) where some
sustained paging starts. I explore figures
that make the system land in this state.
I do not have a use-exactly-this computed
figure technique. But I run into the problems
fairly easily/quickly so far. As stress itself
uses some memory, the ???? need not be strictly
based on exactly 1/2 of the initial Free RAM
value --but that figure suggests were I explore
around.

The kills sometimes are not during the grep
but somewhat after. Sometimes, after grep is
done, stopping stress and starting it again
leads to a fairly quick kill.

The system used for the testing is an aarch64
MACCHIATObin Double Shot (4 Cortex-A72s) with
16 GiBytes of RAM. I can boot either its ZFS
media or its UFS media. (The other OS media is
normally ignored by the system configuration.)

Jan Mikkelsen

unread,
May 10, 2022, 11:49:49 AM5/10/22
to Mark Millard, Pete Wright, freebsd-current
I have been looking at an OOM related issue. Ignoring the actual leak, the problem leads to a process being killed because the system was out of memory. This is fine. After that, however, the system console was black with a single block cursor and the console keyboard was unresponsive. Caps lock and num lock didn’t toggle their lights when pressed.

Using an ssh session, the system looked fine. USB events for the keyboard being disconnected and reconnected appeared but the keyboard stayed unresponsive.

Setting vm.swap_enabled=0, as you did above, resolved this problem. After the process was killed a perfectly normal console returned.

The interesting thing is that this test system is configured with no swap space.

This is on 13.1-RC5.

> This disables swapping out of process kernel stacks. It
> is just with that option removedfor gaining free RAM, there
> fewer options tried before a kill is initiated. It is not a
> loader-time tunable but is writable, thus the
> /etc/sysctl.conf placement.

Is that really what it does? From a quick look at the code in vm/vm_swapout.c, it seems little more complex.

Regards,

Jan M.



Mark Millard

unread,
May 10, 2022, 2:52:00 PM5/10/22
to Jan Mikkelsen, Pete Wright, freebsd-current
I was going by its description:

# sysctl -d vm.swap_enabled
vm.swap_enabled: Enable entire process swapout

Based on the below, it appears that the description
presumes vm.swap_idle_enabled==0 (the default). In
my context vm.swap_idle_enabled==0 . Looks like I
should also list:

vm.swap_idle_enabled=0

in my /etc/sysctl.conf with a reminder comment that the
pair of =0's are required for avoiding the observed
hang-ups.


The analysis goes like . . .

I see in the code that vm.swap_enabled !=0 causes
VM_SWAP_NORMAL :

void
vm_swapout_run(void)
{

if (vm_swap_enabled)
vm_req_vmdaemon(VM_SWAP_NORMAL);
}

and that in turn leads to vm_daemon to:

if (swapout_flags != 0) {
/*
* Drain the per-CPU page queue batches as a deadlock
* avoidance measure.
*/
if ((swapout_flags & VM_SWAP_NORMAL) != 0)
vm_page_pqbatch_drain();
swapout_procs(swapout_flags);
}

Note: vm.swap_idle_enabled==0 && vm.swap_enabled==0 ends
up with swapout_flags==0. vm.swap_idle. . . defaults seem
to be (in my context):

# sysctl -a | grep swap_idle
vm.swap_idle_threshold2: 10
vm.swap_idle_threshold1: 2
vm.swap_idle_enabled: 0

For reference:

/*
* Idle process swapout -- run once per second when pagedaemons are
* reclaiming pages.
*/
void
vm_swapout_run_idle(void)
{
static long lsec;

if (!vm_swap_idle_enabled || time_second == lsec)
return;
vm_req_vmdaemon(VM_SWAP_IDLE);
lsec = time_second;
}

[So vm.swap_idle_enabled==0 avoids VM_SWAP_IDLE status.]

static void
vm_req_vmdaemon(int req)
{
static int lastrun = 0;

mtx_lock(&vm_daemon_mtx);
vm_pageout_req_swapout |= req;
if ((ticks > (lastrun + hz)) || (ticks < lastrun)) {
wakeup(&vm_daemon_needed);
lastrun = ticks;
}
mtx_unlock(&vm_daemon_mtx);
}

[So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits
in vm_pageout_req_swapout.]

vm_deamon does:

mtx_lock(&vm_daemon_mtx);
msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, "psleep",
vm_daemon_timeout);
swapout_flags = vm_pageout_req_swapout;
vm_pageout_req_swapout = 0;
mtx_unlock(&vm_daemon_mtx);

So vm_pageout_req_swapout is regenerated after thata
each time.

I'll not show the code for vm.swap_idle_enabled!=0 .

Mark Millard

unread,
May 10, 2022, 8:52:00 PM5/10/22
to Jan Mikkelsen, Pete Wright, freebsd-current
Well, with continued experiments I got an example of
a hangup for which looking via the db> prompt did not
show any swapping out of process kernel stacks
( vm.swap_enabled=0 was the context, so expected ).
The environment was ZFS (so with ARC).

But this was testing with vm.pageout_oom_seq=120 instead
of the default vm.pageout_oom_seq=12 . It may be that
let sit long enough things would have unhung (external
perspective).

It is part of what I'm experimenting with so we will see.

Mark Millard

unread,
May 10, 2022, 11:33:44 PM5/10/22
to Jan Mikkelsen, Pete Wright, freebsd-current
Looks like I might have overreacted, in that for my
current tests there can be brief periods of delayed
response, but things respond in a little bit.
Definately not like the hang-ups I was getting with
vm.swap_enabled=1 .

Mark Millard

unread,
May 11, 2022, 3:54:41 PM5/11/22
to Jan Mikkelsen, Pete Wright, freebsd-current
The following is based on using vm.pageout_oom_seq=120
which greatly delays kills. (I've never waited long
enough.) vm.pageout_oom_seq=12 tends to get a kill
fairly quickly, making the below hard to observe.

More testing has shown it can hang up with use of
vm.swap_enabled=0 with vm.swap_idle_enabled=0 --but
the details I've observed suggest a livelock rather
than a deadlock. It appears that the likes of (db>
use output extractions):

1171 1168 1168 0 R+ CPU 2 stress
1170 1168 1168 0 R+ CPU 0 stress
and:
18 0 0 0 RL (threaded) [pagedaemon]
100120 Run CPU 1 [dom0]
100132 D launds 0xffff000000f1dc44 [laundry: dom0]
100133 D umarcl 0xffff0000007d8424 [uma]

stay busy using power like when I have just those
significantly active and the system is not hung-up.
(30.6W..30.8W or so range, where idle is more like
26W and more general activity being involved ends
up with the power jumping around over a wider
range, for example.)

I have observed non-hung-up tests where the 2 stress
processes using the memory were getting around 99%
in top and and [pagedaemon{dom0}] was getting around 90%
but a grep was getting more like 0.04%. This looks like
a near-livelock and it was what inspired looking if more
suggested a livelock for a hang-up.

Looking via db> use always has looked like the above.
(Sometimes I've used 3 memory-using stress processes but
now usually 2, leaving one CPU typically being idle.)

That in turn lead to monitoring the power, ending up as
mentioned above.

I have also observed hang-up-like cases where the top
that had been running would sometimes get individual
screen updates many minutes apart. With the power usage
pattern it again seems like a (near) livelock.


Relative to avoiding hang-ups, so far it seems that
use of vm.swap_enabled=0 with vm.swap_idle_enabled=0
makes hang-ups less likely/less frequent/harder to
produce examples of. But is no guarantee of lack of
a hang-up. Its does change the cause of the hang-up
(in that it avoids processes with kernel stacks swapped
out being involved).

What I do to avoid rebooting for a hang-up I'd done
with is to kill the memory using stress processes
via db> use and then c out of the kernel debugger
(i.e., continue). So far the system has always
returned to normal in response.

Pete Wright

unread,
May 13, 2022, 4:45:09 PM5/13/22
to freebsd...@freebsd.org


On 5/11/22 12:52, Mark Millard wrote:
>
>
> Relative to avoiding hang-ups, so far it seems that
> use of vm.swap_enabled=0 with vm.swap_idle_enabled=0
> makes hang-ups less likely/less frequent/harder to
> produce examples of. But is no guarantee of lack of
> a hang-up. Its does change the cause of the hang-up
> (in that it avoids processes with kernel stacks swapped
> out being involved).

thanks for the above analysis Mark.  i am going to test these settings
out now as i'm still seeing the lockup.

this most recent hang-up was using a patch tijl@ asked me to test
(attached to this email), and the default setting of vm.pageout_oom_seq:
12.  interestingly enough with the patch applied i observed a smaller
amount of memory used for laundry as well as less swap space used until
right before the crash.

cheers,
-p
vm_pageout_worker.patch

Mark Millard

unread,
May 14, 2022, 4:11:55 AM5/14/22
to Pete Wright, freebsd-current
Pete Wright <pete_at_nomadlogic.org> wrote on
Date: Fri, 13 May 2022 13:43:11 -0700 :

> On 5/11/22 12:52, Mark Millard wrote:
> >
> >
> > Relative to avoiding hang-ups, so far it seems that
> > use of vm.swap_enabled=0 with vm.swap_idle_enabled=0
> > makes hang-ups less likely/less frequent/harder to
> > produce examples of. But is no guarantee of lack of
> > a hang-up. Its does change the cause of the hang-up
> > (in that it avoids processes with kernel stacks swapped
> > out being involved).
>
> thanks for the above analysis Mark. i am going to test these settings
> out now as i'm still seeing the lockup.
>
> this most recent hang-up was using a patch tijl_at_ asked me to test
> (attached to this email), and the default setting of vm.pageout_oom_seq:
> 12.

I also had been run various tests for tijl_at_ , the same
sort of 'removal of the " + 1" patch'. I had found a basic
way to tell if a fundamental problem was completely
avoided or not, without having to wait long periods of
activity to do so. But that does not mean the test is a
good simulation of your context's sequence that leads to
issues. Nor does it indicate how wide a range of activity
is fairly likely to reach the failing conditions.

You could see how vm.pageout_oom_seq=120 does for you with
the patch. I was never patient enough to wait long enough
for this to OOM kill or hang-up in my test context.

I've been reporting the likes of:

# sysctl vm.domain.0.stats # done after the fact
vm.domain.0.stats.inactive_pps: 1037
vm.domain.0.stats.free_severe: 15566
vm.domain.0.stats.free_min: 25759
vm.domain.0.stats.free_reserved: 5374
vm.domain.0.stats.free_target: 86914
vm.domain.0.stats.inactive_target: 130371
vm.domain.0.stats.unswppdpgs: 0
vm.domain.0.stats.unswappable: 0
vm.domain.0.stats.laundpdpgs: 858845
vm.domain.0.stats.laundry: 9
vm.domain.0.stats.inactpdpgs: 1040939
vm.domain.0.stats.inactive: 1063
vm.domain.0.stats.actpdpgs: 407937767
vm.domain.0.stats.active: 1032
vm.domain.0.stats.free_count: 3252526

But I also have a kernel that reports just before
the call that is to cause a OOM kill, ending up
with output like:

vm_pageout_mightbe_oom: kill context: v_free_count: 15306, v_inactive_count: 1, v_laundry_count: 64, v_active_count: 3891599
May 11 00:44:11 CA72_Mbin_ZFS kernel: pid 844 (stress), jid 0, uid 0, was killed: failed to reclaim memory

(I was testing main [so: 14].) So I report that as well.

Since I was using stress as part of my test context, there
were also lines like:

stress: FAIL: [843] (415) <-- worker 844 got signal 9
stress: WARN: [843] (417) now reaping child worker processes
stress: FAIL: [843] (451) failed run completed in 119s

(tijl_at_ had me add v_laundry_count and v_active_count
to what I've had carried forward since back in 2018 when
Mark J. provided the original extra message.)

Turns out the kernel debugger (db> prompt) can report the
same general sort of figures:

db> show page
vm_cnt.v_free_count: 15577
vm_cnt.v_inactive_count: 1
vm_cnt.v_active_count: 3788852
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 272395
vm_cnt.v_free_reserved: 5374
vm_cnt.v_free_min: 25759
vm_cnt.v_free_target: 86914
vm_cnt.v_inactive_target: 130371

db> show pageq
pq_free 15577
dom 0 page_cnt 4077116 free 15577 pq_act 3788852 pq_inact 1 pq_laund 0 pq_unsw 0

(Note: pq_unsw is a non-swappable count that excludes
the wired count, apparently matching
vm.domain.0.stats.unswappable .)

The above is the most extremely small pq_inact+pq_laund that
I saw at the OOM kill time or during a "hang-up" (what I saw
across example "hang-ups" suggests to me a livelock context,
not a deadlock context).

> interestingly enough with the patch applied i observed a smaller
> amount of memory used for laundry as well as less swap space used until
> right before the crash.

If your logging of values has been made public, I've not
(yet?) looked at it at all.

None of my testing reached a stage of having much swap
space in use. But the test is biased to produce the problems
quickly, rather than to explore a range of ways to reach
conditions with the problem.

I've stopped testing for now and am doing a round of OS
building and upgrading, port (re-)building and installing
and the like, mostly for aarch64 but also for armv7 and
amd64. (This is without the 'remove " + 1"' patch.)

One of the points is to see if I get any evidence of
vm.swap_enabled=0 with vm.swap_idle_enabled=0 ending up
contributing to any problems in my normal usage. So far: no.
vm.pageout_oom_seq=120 is in use for this, my normal
context since sometime in 2018.

Pete Wright

unread,
May 29, 2022, 1:09:32 PM5/29/22
to Mark Millard, freebsd-current

On 5/14/22 01:09, Mark Millard wrote:
>
> One of the points is to see if I get any evidence of
> vm.swap_enabled=0 with vm.swap_idle_enabled=0 ending up
> contributing to any problems in my normal usage. So far: no.
> vm.pageout_oom_seq=120 is in use for this, my normal
> context since sometime in 2018.

So to revive an old thread here.

it looks like setting these two sysctl knobs have helped the situation:
vm.swap_enabled=0
vm.swap_idle_enabled=0

i've gone 7 days without any OOM events under normal work usage (as
opposed to about 4days previously).  this includes the following patch
to vm_pageout.c that tijl@ shared with us:

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 36d5f327580..df827af3075 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1069,7 +1069,7 @@ vm_pageout_laundry_worker(void *arg)
                nclean = vmd->vmd_free_count +
                    vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt;
                ndirty = vmd->vmd_pagequeues[PQ_LAUNDRY].pq_cnt;
-               if (target == 0 && ndirty * isqrt(howmany(nfreed + 1,
+               if (target == 0 && ndirty * isqrt(howmany(nfreed,
                    vmd->vmd_free_target - vmd->vmd_free_min)) >= nclean) {
                        target = vmd->vmd_background_launder_target;
                }


I have adjusted my behavior a little bit as well, since i do quite a bit
of work in the AWS console in firefox I've been better at closing out
all of those tabs when i'm not using them (their console is a serious
memory hog).  i've also started using an official chrome binary inside
an ubuntu jail which is where i run slack and discord, that seems to
behave better as well in terms of memory utilization.

i am going to revert the vm_pageout.c patch today when i do my weekly
rebuild of world to see how things go, maybe that'll give determine if
its really the sysctl's helping or not.

cheers,
-pete

Mark Millard

unread,
May 29, 2022, 2:18:31 PM5/29/22
to Pete Wright, freebsd-current
On 2022-May-29, at 10:07, Pete Wright <pe...@nomadlogic.org> wrote:

> On 5/14/22 01:09, Mark Millard wrote:
>>
>> One of the points is to see if I get any evidence of
>> vm.swap_enabled=0 with vm.swap_idle_enabled=0 ending up
>> contributing to any problems in my normal usage. So far: no.
>> vm.pageout_oom_seq=120 is in use for this, my normal
>> context since sometime in 2018.
>
> So to revive an old thread here.
>
> it looks like setting these two sysctl knobs have helped the situation:
> vm.swap_enabled=0
> vm.swap_idle_enabled=0
>
> i've gone 7 days without any OOM events under normal work usage (as opposed to about 4days previously).

FYI, the combination:

vm.pageout_oom_seq=120 # in /boot/loader.conf
vm.swap_enabled=0 # in /etc/sysctl.conf
vm.swap_idle_enabled=0 # in /etc/sysctl.conf

still has not caused me any additional problems
and helps avoid loss of access by avoiding the
relevant interaction-processes from having their
kernel stacks swapped out. (Not that the effect
of vm.swap_enabled=0 is limited to
interaction-processes.)

So, the combination is now part of the
configuration of each FreeBSD that I use.

> this includes the following patch to vm_pageout.c that tijl@ shared with us:
>
> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
> index 36d5f327580..df827af3075 100644
> --- a/sys/vm/vm_pageout.c
> +++ b/sys/vm/vm_pageout.c
> @@ -1069,7 +1069,7 @@ vm_pageout_laundry_worker(void *arg)
> nclean = vmd->vmd_free_count +
> vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt;
> ndirty = vmd->vmd_pagequeues[PQ_LAUNDRY].pq_cnt;
> - if (target == 0 && ndirty * isqrt(howmany(nfreed + 1,
> + if (target == 0 && ndirty * isqrt(howmany(nfreed,
> vmd->vmd_free_target - vmd->vmd_free_min)) >= nclean) {
> target = vmd->vmd_background_launder_target;
> }

FYI: I restored the original code after doing the testing
for tijl@ .

> I have adjusted my behavior a little bit as well, since i do quite a bit of work in the AWS console in firefox I've been better at closing out all of those tabs when i'm not using them (their console is a serious memory hog). i've also started using an official chrome binary inside an ubuntu jail which is where i run slack and discord, that seems to behave better as well in terms of memory utilization.
>
> i am going to revert the vm_pageout.c patch today when i do my weekly rebuild of world to see how things go, maybe that'll give determine if its really the sysctl's helping or not.

Pete Wright

unread,
May 29, 2022, 11:44:40 PM5/29/22
to Mark Millard, freebsd-current


On 5/29/22 11:16, Mark Millard wrote:
>
> FYI, the combination:
>
> vm.pageout_oom_seq=120 # in /boot/loader.conf
> vm.swap_enabled=0 # in /etc/sysctl.conf
> vm.swap_idle_enabled=0 # in /etc/sysctl.conf
>
> still has not caused me any additional problems
> and helps avoid loss of access by avoiding the
> relevant interaction-processes from having their
> kernel stacks swapped out. (Not that the effect
> of vm.swap_enabled=0 is limited to
> interaction-processes.)
>
> So, the combination is now part of the
> configuration of each FreeBSD that I use.

awesome thanks Mark!  i appreciate your feedback and input, i've
certainly learned quite a bit as well which is awesome.

i'm going to revert the diff as well when i kick off my weekly rebuild now.
Reply all
Reply to author
Forward
0 new messages