Server became inaccessible because it ran out of swap space

0 views
Skip to first unread message

Odhiambo Washington

unread,
3:57 AM (18 hours ago) 3:57 AM
to questions
I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also configured 13GB or swap space.

```
root@gw:/usr/local/bhyve-vms/scripts # swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/ada0p3       3163136   703316  2459820    22%
/dev/md0.eli     10485760   709352  9776408     7%
Total            13648896  1412668 12236228    10%
root@gw:/usr/local/bhyve-vms/scripts #
```

A number of times it has become inaccessible until I do a hard reboot and this has been caused by what I believe is running out of swap.

Below is what I have obtained from /var/log/messages after I rebooted.

How do I identify the culprit? Arrest the situation?


```
Jul  5 06:50:56 gw kernel: failed
Jul  5 06:52:11 gw kernel: failed
Jul  5 06:52:11 gw kernel: out of swap space
Jul  5 06:52:11 gw kernel: failed
Jul  5 06:52:11 gw kernel: failed
Jul  5 06:52:12 gw kernel: failed
Jul  5 06:52:12 gw kernel: failed
Jul  5 06:54:06 gw kernel: out of swap space
Jul  5 06:54:06 gw kernel: failed
Jul  5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: tap4: link state changed to DOWN
Jul  5 07:16:30 gw kernel: out of swap space
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: tap5: link state changed to DOWN
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100 (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0
Jul  5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: failed to reclaim memory
Jul  5 07:16:30 gw kernel: tap3: link state changed to DOWN
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:30 gw kernel: out of swap space
Jul  5 07:16:30 gw kernel: failed
Jul  5 07:16:31 gw kernel: failed
Jul  5 07:16:31 gw kernel: failed
Jul  5 07:16:32 gw kernel: out of swap space
Jul  5 07:16:33 gw kernel: out of swap space
Jul  5 07:16:33 gw kernel: failed
Jul  5 07:16:33 gw kernel: failed
Jul  5 07:16:34 gw kernel: out of swap space
Jul  5 07:16:34 gw kernel: failed
Jul  5 07:16:36 gw kernel: failed
Jul  5 07:16:36 gw kernel: failed
Jul  5 07:16:36 gw kernel: failed
Jul  5 07:16:36 gw kernel: failed
Jul  5 07:16:36 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:37 gw kernel: failed
Jul  5 07:16:38 gw kernel: failed
```


--
Best regards,
Odhiambo WASHINGTON,
Nairobi,KE
+254 7 3200 0004/+254 7 2274 3223
 In an Internet failure case, the #1 suspect is a constant: DNS.
"Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)

David Palma

unread,
4:28 AM (18 hours ago) 4:28 AM
to Odhiambo Washington, questions
Hi,
I'm not sure but looking at the bhyve processes being killed, it reminds
of an earlier issue that was solved with:

`vm.disable_swapspace_pageouts=1`

Cheers,
David


Odhiambo Washington

unread,
4:48 AM (18 hours ago) 4:48 AM
to David Palma, questions
Hello David,

Thank you for this.

Let me enable this and monitor. 
Reply all
Reply to author
Forward
0 new messages