Couchbase beam.smp OOM - but there is a lot of free memory in the cluster

427 views
Skip to first unread message

sirk...@gmail.com

unread,
Jul 15, 2015, 8:55:16 AM7/15/15
to couc...@googlegroups.com
Hi

I have a cluster of 4 nodes, each witch 30GB memory (Amazon EC2)
Couch gui show memory usage of 24-28%

The cluster memory status:
Total Allocated (32.3 GB)     Total in Cluster (70.3 GB)
In Use (17.2 GB)  Unused (15 GB)   Unallocated (37.9 GB)


Despite the fact that there is a bunch of free memory, periodically there is a memory usage increase, and OOM fires.
It usually kills beam.smp (the cluster "works" after that

Jul 15 07:08:41 couch01 kernel: [9214515.877193] Out of memory: Kill process 6639 (beam.smp) score 786 or sacrifice child
Jul 15 07:08:41 couch01 kernel: [9214515.881859] Killed process 6639 (beam.smp) total-vm:27077936kB, anon-rss:24227172kB, file-rss:0kB
Jul 15 07:08:41 couch01 kernel: [9214515.892273] memcached invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jul 15 07:08:41 couch01 kernel: [9214515.892275] memcached cpuset=/ mems_allowed=0

Today it did kill memcache, and the node failed, cluster tend to rebalance, but failed (second node also crushed)

Jul 15 11:05:31 couch01 kernel: [9228726.790765] Out of memory: Kill process 117040 (memcached) score 164 or sacrifice child
Jul 15 11:05:31 couch01 kernel: [9228726.796414] Killed process 117040 (memcached) total-vm:5399148kB, anon-rss:5067456kB, file-rss:0kB

Why it does happen, I seem to have a lot of free memory.
Current node shows a lot of free memory

nodeA $ free -m
             total       used       free     shared    buffers     cached
Mem:         30147       8618      21528          0        161       1015
-/+ buffers/cache:       7440      22706
Swap:            0          0          0

I guess, that from time to time some tasks starts, and tend to do some computation tat require all "Total Allocated" memory on a single node?
How can I limit per-node memory usage?

This look like configuration problem.
Any clues?


sirk...@gmail.com

unread,
Jul 20, 2015, 8:30:01 AM7/20/15
to couc...@googlegroups.com
I've some idea, that this might be a slow disk (600/3000 IOPS) EBS on our cluster nodes.
Or 
The backup-XDCR-node which is in other zone might have some network lag (I haven't notice anything like that though), or some performance issues.

Still, that shouldn't result in eating more memory than is allowed (quota per node: 18GB out of 30GB) so the OOM shouldn't fire :/

Eduardo Camargo

unread,
Feb 22, 2017, 1:11:13 PM2/22/17
to Couchbase
any solution yet ??
i have similar problem
Reply all
Reply to author
Forward
0 new messages