Last night we had 3 nodes crash at the same time out of a 6 node cluster.
May 11 21:48:02 node4 kernel: [7336182.265235] mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
May 11 21:48:02 node4 kernel: [7336182.273747] mysqld cpuset=/ mems_allowed=0-1
May 11 21:48:02 node4 kernel: [7336182.275878] Pid: 27347, comm: mysqld Not tainted 3.4.37-40.44.amzn1.x86_64 #1
May 11 21:48:02 node4 kernel: [7336182.279841] Call Trace:
May 11 21:48:02 node4 kernel: [7336182.281287] [<ffffffff8110644e>] dump_header.constprop.6+0x7e/0x1b0
May 11 21:48:02 node4 kernel: [7336182.284242] [<ffffffff81233130>] ? ___ratelimit+0xa0/0x120
May 11 21:48:02 node4 kernel: [7336182.286881] [<ffffffff811067fd>] oom_kill_process.part.4.constprop.5+0x13d/0x280
May 11 21:48:02 node4 kernel: [7336182.290900] [<ffffffff811e12a5>] ? security_capable_noaudit+0x15/0x20
May 11 21:48:02 node4 kernel: [7336182.294265] [<ffffffff81054717>] ? has_capability_noaudit+0x17/0x20
May 11 21:48:02 node4 kernel: [7336182.297247] [<ffffffff81106e37>] out_of_memory+0x367/0x540
May 11 21:48:02 node4 kernel: [7336182.299944] [<ffffffff8110bfa9>] __alloc_pages_nodemask+0x8e9/0x900
May 11 21:48:02 node4 kernel: [7336182.302996] [<ffffffff812116e2>] ? queue_unplugged+0x62/0xf0
May 11 21:48:02 node4 kernel: [7336182.305729] [<ffffffffa009a7d0>] ? noalloc_get_block_write+0x30/0x30 [ext4]
May 11 21:48:02 node4 kernel: [7336182.309038] [<ffffffff81142036>] alloc_pages_current+0xb6/0x120
May 11 21:48:02 node4 kernel: [7336182.311995] [<ffffffff81102a8f>] __page_cache_alloc+0xcf/0xf0
May 11 21:48:02 node4 kernel: [7336182.314751] [<ffffffff811052f8>] filemap_fault+0x298/0x460
May 11 21:48:02 node4 kernel: [7336182.317328] [<ffffffff81124dcf>] __do_fault+0x6f/0x4d0
May 11 21:48:02 node4 kernel: [7336182.319698] [<ffffffff81127e77>] handle_pte_fault+0xf7/0x970
May 11 21:48:02 node4 kernel: [7336182.322386] [<ffffffff81129979>] handle_mm_fault+0x259/0x340
May 11 21:48:02 node4 kernel: [7336182.325571] [<ffffffff8130e5ed>] ? sock_aio_read+0x2d/0x40
May 11 21:48:02 node4 kernel: [7336182.328416] [<ffffffff813e9799>] do_page_fault+0x139/0x4e0
May 11 21:48:02 node4 kernel: [7336182.332204] [<ffffffff813e5d89>] ? _raw_spin_unlock_bh+0x19/0x20
May 11 21:48:02 node4 kernel: [7336182.335024] [<ffffffff8131264a>] ? release_sock+0xfa/0x120
May 11 21:48:02 node4 kernel: [7336182.338632] [<ffffffff8103c355>] ? pvclock_clocksource_read+0x55/0xf0
May 11 21:48:02 node4 kernel: [7336182.341820] [<ffffffff813e63e5>] page_fault+0x25/0x30
May 11 21:48:02 node4 kernel: [7336182.344426] Mem-Info:
May 11 21:48:02 node4 kernel: [7336182.345886] Node 0 DMA per-cpu:
May 11 21:48:02 node4 kernel: [7336182.348058] CPU 0: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.350483] CPU 1: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.352691] CPU 2: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.354883] CPU 3: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.357517] CPU 4: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.360319] CPU 5: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.362915] CPU 6: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.365239] CPU 7: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.367433] CPU 8: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.369906] CPU 9: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.372369] CPU 10: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.374840] CPU 11: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.377352] CPU 12: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.379548] CPU 13: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.381758] CPU 14: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.383932] CPU 15: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.386128] CPU 16: hi: 0, btch: 1 usd: 0
May 11 21:48:02 node4 kernel: [7336182.424113] Node 0 DMA32 per-cpu:
May 11 21:48:02 node4 kernel: [7336182.425959] CPU 0: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.428215] CPU 1: hi: 186, btch: 31 usd: 58
May 11 21:48:02 node4 kernel: [7336182.430416] CPU 2: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.432838] CPU 3: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.435054] CPU 4: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.437265] CPU 5: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.439458] CPU 6: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.441673] CPU 7: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.443874] CPU 8: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.446087] CPU 9: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.448336] CPU 10: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.450754] CPU 11: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.453503] CPU 12: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.455725] CPU 13: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.457934] CPU 14: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.460151] CPU 15: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.462632] CPU 16: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.464968] CPU 17: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.467251] CPU 18: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.469538] CPU 19: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.471831] CPU 20: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.474145] CPU 21: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.476683] CPU 22: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.478851] CPU 23: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.481044] CPU 24: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.483231] CPU 25: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.485765] CPU 26: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.487937] CPU 27: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.490449] CPU 28: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.492654] CPU 29: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.495148] CPU 30: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.497558] CPU 31: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.503985] Node 0 Normal per-cpu:
May 11 21:48:02 node4 kernel: [7336182.506022] CPU 0: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.508494] CPU 1: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.511403] CPU 2: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.514203] CPU 3: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.516750] CPU 4: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.518922] CPU 5: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.521127] CPU 6: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.523322] CPU 7: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.525829] CPU 8: hi: 186, btch: 31 usd: 0
May 11 21:48:02 node4 kernel: [7336182.668479] active_anon:59741220 inactive_anon:2719180 isolated_anon:0
May 11 21:48:02 node4 kernel: [7336182.668479] active_file:331 inactive_file:0 isolated_file:52
May 11 21:48:02 node4 kernel: [7336182.668480] unevictable:8 dirty:0 writeback:0 unstable:0
May 11 21:48:02 node4 kernel: [7336182.668481] free:145665 slab_reclaimable:32128 slab_unreclaimable:18577
May 11 21:48:02 node4 kernel: [7336182.668482] mapped:230 shmem:4038827 pagetables:150251 bounce:0
May 11 21:48:02 node4 kernel: [7336182.685186] Node 0 DMA free:15908kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15652kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
May 11 21:48:02 node4 kernel: [7336182.704317] lowmem_reserve[]: 0 3760 123027 123027
May 11 21:48:02 node4 kernel: [7336182.707051] Node 0 DMA32 free:478348kB min:1376kB low:1720kB high:2064kB active_anon:3320088kB inactive_anon:36948kB active_file:0kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):28kB present:3850496kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:37820kB slab_reclaimable:2084kB slab_unreclaimable:272kB kernel_stack:8kB pagetables:3072kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:17 all_unreclaimable? no
May 11 21:48:02 node4 kernel: [7336182.724517] lowmem_reserve[]: 0 0 119266 119266
May 11 21:48:02 node4 kernel: [7336182.727889] Node 0 Normal free:43584kB min:43672kB low:54588kB high:65508kB active_anon:111972680kB inactive_anon:9333612kB active_file:1028kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):200kB present:122129280kB mlocked:0kB dirty:0kB writeback:0kB mapped:852kB shmem:12915252kB slab_reclaimable:97860kB slab_unreclaimable:46360kB kernel_stack:1856kB pagetables:353060kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 11 21:48:02 node4 kernel: [7336182.746277] lowmem_reserve[]: 0 0 0 0
May 11 21:48:02 node4 kernel: [7336182.748642] Node 1 Normal free:44900kB min:45056kB low:56320kB high:67584kB active_anon:123672112kB inactive_anon:1506192kB active_file:4kB inactive_file:128kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:126000000kB mlocked:32kB dirty:0kB writeback:0kB mapped:0kB shmem:3202260kB slab_reclaimable:28376kB slab_unreclaimable:27672kB kernel_stack:760kB pagetables:244872kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:81 all_unreclaimable? no
May 11 21:48:02 node4 kernel: [7336182.769637] lowmem_reserve[]: 0 0 0 0
May 11 21:48:02 node4 kernel: [7336182.771944] Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
May 11 21:48:02 node4 kernel: [7336182.779264] Node 0 DMA32: 406*4kB 1259*8kB 2321*16kB 1691*32kB 1114*64kB 690*128kB 335*256kB 140*512kB 43*1024kB 7*2048kB 0*4096kB = 478368kB
May 11 21:48:02 node4 kernel: [7336182.786913] Node 0 Normal: 11237*4kB 93*8kB 14*16kB 2*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45980kB
May 11 21:48:02 node4 kernel: [7336182.794923] Node 1 Normal: 1131*4kB 344*8kB 214*16kB 170*32kB 81*64kB 57*128kB 38*256kB 6*512kB 4*1024kB 0*2048kB 0*4096kB = 45516kB
May 11 21:48:02 node4 kernel: [7336182.802542] 4039692 total pagecache pages
May 11 21:48:02 node4 kernel: [7336182.804734] 0 pages in swap cache
May 11 21:48:02 node4 kernel: [7336182.806319] Swap cache stats: add 0, delete 0, find 0/0
May 11 21:48:02 node4 kernel: [7336182.809626] Free swap = 0kB
May 11 21:48:02 node4 kernel: [7336182.811010] Total swap = 0kB
May 11 21:48:02 node4 kernel: [7336183.254335] 63999984 pages RAM
May 11 21:48:02 node4 kernel: [7336183.256417] 1020195 pages reserved
May 11 21:48:02 node4 kernel: [7336183.258218] 826 pages shared
May 11 21:48:02 node4 kernel: [7336183.259612] 62832290 pages non-shared
May 11 21:48:02 node4 kernel: [7336183.261633] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
May 11 21:48:02 node4 kernel: [7336183.265074] [ 3363] 0 3363 3820 150 7 -17 -1000 udevd
May 11 21:48:02 node4 kernel: [7336183.268547] [ 3861] 0 3861 3739 97 25 -17 -1000 udevd
May 11 21:48:02 node4 kernel: [7336183.272116] [ 4014] 0 4014 2306 125 0 0 0 dhclient
May 11 21:48:02 node4 kernel: [7336183.275685] [ 4054] 0 4054 27956 104 15 -17 -1000 auditd
May 11 21:48:02 node4 kernel: [7336183.279471] [ 4067] 0 4067 60960 1702 0 0 0 rsyslogd
May 11 21:48:02 node4 kernel: [7336183.283316] [ 4078] 0 4078 3488 160 4 0 0 irqbalance
May 11 21:48:02 node4 kernel: [7336183.286987] [ 4089] 81 4089 5389 56 7 0 0 dbus-daemon
May 11 21:48:02 node4 kernel: [7336183.290994] [ 4127] 0 4127 1048 27 0 0 0 acpid
May 11 21:48:02 node4 kernel: [7336183.294928] [ 4265] 0 4265 19456 193 0 -17 -1000 sshd
May 11 21:48:02 node4 kernel: [7336183.298257] [ 4273] 0 4273 5697 60 0 0 0 xinetd
May 11 21:48:02 node4 kernel: [7336183.301753] [ 4298] 38 4298 6764 148 3 0 0 ntpd
May 11 21:48:02 node4 kernel: [7336183.305369] [ 4313] 0 4313 21773 457 12 0 0 sendmail
May 11 21:48:02 node4 kernel: [7336183.309023] [ 4321] 51 4321 19635 360 10 0 0 sendmail
May 11 21:48:02 node4 kernel: [7336183.312862] [ 4346] 0 4346 31040 152 23 0 0 crond
May 11 21:48:02 node4 kernel: [7336183.317054] [ 4357] 0 4357 5427 42 7 0 0 atd
May 11 21:48:02 node4 kernel: [7336183.320831] [ 4448] 0 4448 1047 23 0 0 0 agetty
May 11 21:48:02 node4 kernel: [7336183.324553] [ 4450] 0 4450 1044 22 14 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.328233] [ 4454] 0 4454 1044 23 3 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.331709] [ 4456] 0 4456 1044 21 5 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.335198] [ 4458] 0 4458 1044 22 4 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.338844] [ 4461] 0 4461 1044 22 0 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.342335] [ 4466] 0 4466 3819 150 26 -17 -1000 udevd
May 11 21:48:02 node4 kernel: [7336183.345702] [ 4467] 0 4467 1044 23 13 0 0 mingetty
May 11 21:48:02 node4 kernel: [7336183.349315] [ 5253] 0 5253 32973 8130 0 0 0 mcollectived
May 11 21:48:02 node4 kernel: [7336183.352977] [ 5296] 0 5296 177224 1331 15 0 0 collectd
May 11 21:48:02 node4 kernel: [7336183.356472] [ 5354] 0 5354 34154 10779 5 0 0 puppet
May 11 21:48:02 node4 kernel: [7336183.359966] [ 5690] 0 5690 2884 88 6 0 0 mysqld_safe
May 11 21:48:02 node4 kernel: [7336183.363977] [ 6733] 27 6733 75871973 58376204 0 0 0 mysqld
May 11 21:48:02 node4 kernel: [7336183.368015] [ 7793] 219 7793 105021 5759 16 0 0 searchd
May 11 21:48:02 node4 kernel: [7336183.371736] [ 6570] 0 6570 29730 2098 5 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.375060] [ 7016] 0 7016 28951 1122 4 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.378946] [ 7522] 0 7522 28901 1255 1 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.382324] [ 7728] 0 7728 27989 344 1 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.385668] [12820] 0 12820 29080 1256 21 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.389167] [12896] 0 12896 28512 860 4 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.392661] [ 1395] 0 1395 27893 285 6 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.396161] [ 9664] 0 9664 27999 367 6 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.399728] [ 5176] 0 5176 28024 379 7 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.403152] [15032] 0 15032 27944 335 5 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.406488] [15184] 0 15184 27986 369 1 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.410095] [25739] 0 25739 28656 954 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.413724] [25793] 0 25793 27893 267 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.417337] [23950] 0 23950 27893 259 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.420668] [23990] 0 23990 27893 258 1 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.424269] [24063] 0 24063 27893 259 5 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.427876] [24371] 0 24371 27893 260 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.431672] [24473] 0 24473 27893 260 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.435179] [24753] 0 24753 27893 261 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.438971] [26253] 0 26253 27978 334 12 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.442588] [26269] 0 26269 28064 477 6 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.446445] [26312] 0 26312 27987 347 7 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.482543] [27338] 0 27338 27893 250 0 0 0 sshd
May 11 21:48:02 node4 kernel: [7336183.486744] Out of memory: Kill process 6733 (mysqld) score 930 or sacrifice child
May 11 21:48:02 node4 kernel: [7336183.490430] Killed process 6733 (mysqld) total-vm:303487892kB, anon-rss:233504816kB, file-rss:0kB
May 11 21:48:03 node4 kernel: [7336183.939107] serial8250: too much work for irq4
May 11 21:48:03 node4 kernel: [7336184.120830] serial8250: too much work for irq4