swapping is completely broken in -CURRENT r334649?

Lev Serebryakov

unread,

Jun 5, 2018, 12:00:25 PM6/5/18

to

I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"…

--
// Lev Serebryakov

signature.asc

Gary Jennejohn

unread,

Jun 5, 2018, 12:21:11 PM6/5/18

to

On Tue, 5 Jun 2018 18:55:52 +0300
Lev Serebryakov <l...@FreeBSD.org> wrote:

> I have 16G of free swap (out of 16G configured), but programs are
> killed due to "out of swap space"
>

I complained about this also and alc@ gave me this hint:
sysctl vm.pageout_update_period=0

I don't whether it will help, but you can give it a try.

--
Gary Jennejohn
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Lev Serebryakov

unread,

Jun 5, 2018, 5:14:05 PM6/5/18

to

On 05.06.2018 19:17, Gary Jennejohn wrote:

>> I have 16G of free swap (out of 16G configured), but programs are
>> killed due to "out of swap space"
>>
>
> I complained about this also and alc@ gave me this hint:
> sysctl vm.pageout_update_period=0
>
> I don't whether it will help, but you can give it a try.

Looks like it helps a little. Very resource-hungry operation have been
completed, but after ~10 minutes, when compilation have been finished,
and swap is clear again, system starts to kill processes. WTF?!

--
// Lev Serebryakov

signature.asc

Lev Serebryakov

unread,

Jun 5, 2018, 5:26:41 PM6/5/18

to

On 05.06.2018 19:17, Gary Jennejohn wrote:

> I complained about this also and alc@ gave me this hint:
> sysctl vm.pageout_update_period=0

Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!

It looks like very serious bug.

--
// Lev Serebryakov

signature.asc

Mark Johnston

unread,

Jun 5, 2018, 5:52:54 PM6/5/18

to

The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.

Kevin Lo

unread,

Jun 15, 2018, 1:16:23 AM6/15/18

to

On Tue, Jun 05, 2018 at 05:48:08PM -0400, Mark Johnston wrote:
>
> On Wed, Jun 06, 2018 at 12:22:08AM +0300, Lev Serebryakov wrote:
> > On 05.06.2018 19:17, Gary Jennejohn wrote:
> >
> >
> > > I complained about this also and alc@ gave me this hint:
> > > sysctl vm.pageout_update_period=0
> >
> > Really, situation is worse than stated in subject, because processes
> > are being killed AFTER memory pressure, when here are a lot of free
> > memory already!
> >
> > It looks like very serious bug.
>
> The issue was identified earlier this week and is being worked on. It's
> a regression from r329882 which appears only on certain hardware. You
> can probably work around it by setting vm.pageout_oom_seq to a large
> value (try 1000 for instance), though this will make the "true" OOM
> killer take longer to kick in. The problem is unrelated to the
> pageout_update_period.

I have a large swap space and I've encountered this issue as well

pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...

Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.

Kevin

Mark Johnston

unread,

Jun 15, 2018, 4:44:29 AM6/15/18

to

On Fri, Jun 15, 2018 at 01:10:25PM +0800, Kevin Lo wrote:
> On Tue, Jun 05, 2018 at 05:48:08PM -0400, Mark Johnston wrote:
> >
> > On Wed, Jun 06, 2018 at 12:22:08AM +0300, Lev Serebryakov wrote:
> > > On 05.06.2018 19:17, Gary Jennejohn wrote:
> > >
> > >
> > > > I complained about this also and alc@ gave me this hint:
> > > > sysctl vm.pageout_update_period=0
> > >
> > > Really, situation is worse than stated in subject, because processes
> > > are being killed AFTER memory pressure, when here are a lot of free
> > > memory already!
> > >
> > > It looks like very serious bug.
> >
> > The issue was identified earlier this week and is being worked on. It's
> > a regression from r329882 which appears only on certain hardware. You
> > can probably work around it by setting vm.pageout_oom_seq to a large
> > value (try 1000 for instance), though this will make the "true" OOM
> > killer take longer to kick in. The problem is unrelated to the
> > pageout_update_period.
>
> I have a large swap space and I've encountered this issue as well
>
> pid 90707 (getty), uid 0, was killed: out of swap space
> pid 90709 (getty), uid 0, was killed: out of swap space
> pid 90709 (getty), uid 0, was killed: out of swap space
> ...
>
> Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
> happy to test it, thanks.

The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

Kurt Jaeger

unread,

Jun 15, 2018, 4:51:58 AM6/15/18

to

Hi!

> The change was committed as r334752. Are you seeing unexpected OOM
> kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.

I'm unsure it was because of that problem or a problem with qemu.

--
p...@opsec.eu +49 171 3101372 2 years to go !

Mark Johnston

unread,

Jun 15, 2018, 5:07:57 AM6/15/18

to

On Fri, Jun 15, 2018 at 10:48:08AM +0200, Kurt Jaeger wrote:
> Hi!
>
> > The change was committed as r334752. Are you seeing unexpected OOM
> > kills on or after that revision?
>
> When I tried to run a qemu-based poudriere run yesterday on a r334918
> box, it killed a few processes outside of that run and did not
> work out.
>
> I'm unsure it was because of that problem or a problem with qemu.

How much memory and swap does the guest have? Were you consistently
able to complete a run before?

If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);

/*

Kurt Jaeger

unread,

Jun 15, 2018, 5:12:15 AM6/15/18

to

Hi!

> > > The change was committed as r334752. Are you seeing unexpected OOM
> > > kills on or after that revision?
> >
> > When I tried to run a qemu-based poudriere run yesterday on a r334918
> > box, it killed a few processes outside of that run and did not
> > work out.
> >
> > I'm unsure it was because of that problem or a problem with qemu.
>
> How much memory and swap does the guest have?

It's started by poudriere, I do not really know.

> Were you consistently able to complete a run before?

Two years ago, on a much lower version of FreeBSD, yes.

I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.

> If it's happening during a poudriere run, it may well have been a true
> OOM situation. The patch below prints a few stats to the dmesg before
> the kill. The output of that together with "sysctl vm" output should be
> enough to determine what's happening.
>
> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
> index 264c98203c51..9c7ebcf451ec 100644
> --- a/sys/vm/vm_pageout.c
> +++ b/sys/vm/vm_pageout.c
> @@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
> * start OOM. Initiate the selection and signaling of the
> * victim.
> */
> + printf("v_free_count: %u, v_inactive_count: %u\n",
> + vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
> vm_pageout_oom(VM_OOM_MEM);
>
> /*

I'll have a look at this.

--
p...@opsec.eu +49 171 3101372 2 years to go !

Mark Johnston

unread,

Jun 15, 2018, 5:13:53 AM6/15/18

to

On Fri, Jun 15, 2018 at 11:07:34AM +0200, Kurt Jaeger wrote:
> Hi!
>
> > > > The change was committed as r334752. Are you seeing unexpected OOM
> > > > kills on or after that revision?
> > >
> > > When I tried to run a qemu-based poudriere run yesterday on a r334918
> > > box, it killed a few processes outside of that run and did not
> > > work out.
> > >
> > > I'm unsure it was because of that problem or a problem with qemu.
> >
> > How much memory and swap does the guest have?
>
> It's started by poudriere, I do not really know.
>
> > Were you consistently able to complete a run before?
>
> Two years ago, on a much lower version of FreeBSD, yes.
>
> I just started it again, and after a while the qemu-ppc64-static
> was at approx. 23 GB memory and increasing, without much progress.

I suspect it is a different issue then.

Mikaël Urankar

unread,

Jun 15, 2018, 5:19:14 AM6/15/18

to

Le ven. 15 juin 2018 à 11:10, Kurt Jaeger <li...@opsec.eu> a écrit :

> Hi!
>
> > > > The change was committed as r334752. Are you seeing unexpected OOM
> > > > kills on or after that revision?
> > >
> > > When I tried to run a qemu-based poudriere run yesterday on a r334918
> > > box, it killed a few processes outside of that run and did not
> > > work out.
> > >
> > > I'm unsure it was because of that problem or a problem with qemu.
> >
> > How much memory and swap does the guest have?
>
> It's started by poudriere, I do not really know.
>
> > Were you consistently able to complete a run before?
>
> Two years ago, on a much lower version of FreeBSD, yes.
>
> I just started it again, and after a while the qemu-ppc64-static
> was at approx. 23 GB memory and increasing, without much progress.

Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.

Kurt Jaeger

unread,

Jun 15, 2018, 5:39:49 AM6/15/18

to

Hi!

> > I just started it again, and after a while the qemu-ppc64-static
> > was at approx. 23 GB memory and increasing, without much progress.

> Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
> situation has evolved since that.

Ok, thanks! Then it's not the same problem.

--
p...@opsec.eu +49 171 3101372 2 years to go !

Kevin Lo

unread,

Jun 15, 2018, 9:52:13 AM6/15/18

to

The box is running -CURRENT r334983. I'll investigate further, thanks.

Mark Linimon

unread,

Jun 15, 2018, 10:38:19 PM6/15/18

to

On Fri, Jun 15, 2018 at 11:14:40AM +0200, Mikaël Urankar wrote:
> Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
> situation has evolved since that.

I've been told by more than one person that it works, but the 2? 3? times
I've tried it it just hung.

I have real hardware so in general it doesn't make a difference to me,
but I'd like to know one way or the other.

mcl