[Lustre-discuss] Page allocation failure

3 views
Skip to first unread message

Lu Wang

unread,
Aug 28, 2009, 3:22:32 AM8/28/09
to lustre-discuss
Dear list,
We got a unusally frequency of computing node crash this days, after we add 10 more OSS to present Lustre system.
The computing nodes crashed with log like:
Aug 27 11:50:43 bws0202 kernel: gmond: page allocation failure. order:1, mode:0x50
Aug 27 11:50:43 bws0202 kernel: [<c0144410>] __alloc_pages+0x294/0x2a6
Aug 27 11:50:43 bws0202 kernel: [<c014443a>] __get_free_pages+0x18/0x24
Aug 27 11:50:43 bws0202 kernel: [<c0146f60>] kmem_getpages+0x1c/0xbb
Aug 27 11:50:43 bws0202 kernel: [<c0147aae>] cache_grow+0xab/0x138
Aug 27 11:50:43 bws0202 kernel: [<c0147ca0>] cache_alloc_refill+0x165/0x19d
Aug 27 11:50:43 bws0202 kernel: [<c0148074>] __kmalloc+0x76/0x88
Aug 27 11:50:43 bws0202 kernel: [<f9630359>] cfs_alloc+0x29/0x70 [libcfs]
Aug 27 11:50:43 bws0202 kernel: [<f96f1407>] ptl_send_rpc+0x197/0x1790 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f96e5f24>] ptlrpc_retain_replayable_request+0x84/0x200 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f96df701>] after_reply+0x5d1/0xaa0 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f96fbfde>] lustre_msg_set_status+0x2e/0x120 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<c011e851>] __wake_up+0x29/0x3c
Aug 27 11:50:43 bws0202 kernel: [<f96e622f>] ptlrpc_queue_wait+0x18f/0x2720 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f94668e0>] lnet_me_unlink+0x40/0x260 [lnet]
Aug 27 11:50:43 bws0202 kernel: [<f9700b22>] reply_in_callback+0x1d2/0x990 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f96f9b2e>] lustre_msg_add_version+0xbe/0x130 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f9630359>] cfs_alloc+0x29/0x70 [libcfs]
Aug 27 11:50:43 bws0202 kernel: [<f96f39a3>] lustre_pack_request_v2+0x83/0x3c0 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f9696d90>] ldlm_resource_putref+0xa0/0x680 [ptlrpc]
Aug 27 11:50:43 bws0202 kernel: [<f96fbb2e>] lustre_msg_set_opc+0x2e/0x120 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f9630359>] cfs_alloc+0x29/0x70 [libcfs]
Aug 27 11:50:44 bws0202 kernel: [<f96e97bc>] ptlrpc_next_xid+0x3c/0x50 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96fc21e>] lustre_msg_set_timeout+0x2e/0x100 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96f9726>] lustre_msg_get_type+0xd6/0x210 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f98b24cb>] mdc_close+0x22b/0xdf0 [mdc]
Aug 27 11:50:44 bws0202 kernel: [<f97db0d3>] ll_release+0xd3/0x600 [lustre]
Aug 27 11:50:44 bws0202 kernel: [<f97ecda2>] ll_close_inode_openhandle+0x152/0xb80 [lustre]
Aug 27 11:50:44 bws0202 kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Aug 27 11:50:44 bws0202 kernel: [<f97ecda2>] ll_close_inode_openhandle+0x152/0xb80 [lustre]
Aug 27 11:50:44 bws0202 kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Aug 27 11:50:44 bws0202 kernel: [<f97ed8fb>] ll_mdc_real_close+0x12b/0x520 [lustre]
Aug 27 11:50:44 bws0202 kernel: [<f9842504>] ll_mdc_blocking_ast+0x224/0x950 [lustre]
Aug 27 11:50:44 bws0202 kernel: [<c02d6c60>] common_interrupt+0x18/0x20
Aug 27 11:50:44 bws0202 kernel: [<f96d3ee5>] ldlm_pool_del+0x75/0x2f0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f9686ac7>] ldlm_lock_destroy_nolock+0x87/0x1f0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f9685118>] unlock_res_and_lock+0x58/0xe0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f9685118>] unlock_res_and_lock+0x58/0xe0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f968e61b>] ldlm_cancel_callback+0x10b/0x160 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96e97bc>] ptlrpc_next_xid+0x3c/0x50 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f9685045>] lock_res_and_lock+0x45/0xc0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b56b4>] ldlm_cli_cancel_local+0xa4/0x6f0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96e4459>] __ptlrpc_req_finished+0x449/0x5f0 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96fc21e>] lustre_msg_set_timeout+0x2e/0x100 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b7b77>] ldlm_cancel_list+0x137/0x360 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b6412>] ldlm_cli_cancel_req+0x252/0xc60 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<c01c4172>] memmove+0xe/0x24
Aug 27 11:50:44 bws0202 kernel: [<f9685cae>] ldlm_lock_remove_from_lru+0x5e/0x210 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b8206>] ldlm_cancel_lru_local+0x126/0x480 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b9154>] ldlm_cli_cancel_list+0x104/0x550 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<c011de76>] find_busiest_group+0xdd/0x295
Aug 27 11:50:44 bws0202 kernel: [<f96b7da0>] ldlm_cancel_shrink_policy+0x0/0x100 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96b88b2>] ldlm_cancel_lru+0x72/0x330 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<c011d2f4>] try_to_wake_up+0x28e/0x299
Aug 27 11:50:44 bws0202 kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Aug 27 11:50:44 bws0202 kernel: [<f96d1946>] ldlm_cli_pool_shrink+0x166/0x440 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96d17e0>] ldlm_cli_pool_shrink+0x0/0x440 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96d1cb4>] ldlm_pool_shrink+0x44/0x160 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<c011e875>] __wake_up_locked+0x11/0x13
Aug 27 11:50:44 bws0202 kernel: [<c0104fc3>] __down_trylock+0x3d/0x46
Aug 27 11:50:44 bws0202 kernel: [<c02d3867>] __down_failed_trylock+0x7/0xc
Aug 27 11:50:44 bws0202 kernel: [<f96987fd>] .text.lock.ldlm_resource+0x2d/0x90 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<f96d4641>] ldlm_pools_shrink+0x271/0x340 [ptlrpc]
Aug 27 11:50:44 bws0202 kernel: [<c0145038>] get_writeback_state+0x30/0x35
Aug 27 11:50:44 bws0202 kernel: [<c014505d>] get_dirty_limits+0x20/0xff
Aug 27 11:50:44 bws0202 kernel: [<c0145038>] get_writeback_state+0x30/0x35
Aug 27 11:50:44 bws0202 kernel: [<c014505d>] get_dirty_limits+0x20/0xff
Aug 27 11:50:44 bws0202 kernel: [<c0149a04>] shrink_slab+0xf8/0x161
Aug 27 11:50:44 bws0202 kernel: [<c014aa3c>] try_to_free_pages+0xd5/0x1bb
Aug 27 11:50:44 bws0202 kernel: [<c0144338>] __alloc_pages+0x1bc/0x2a6
Aug 27 11:50:45 bws0202 kernel: [<c014443a>] __get_free_pages+0x18/0x24
Aug 27 11:50:45 bws0202 kernel: [<c016be0e>] __pollwait+0x2d/0x95
Aug 27 11:50:45 bws0202 kernel: [<c027f579>] datagram_poll+0x25/0xcc
Aug 27 11:50:45 bws0202 kernel: [<c0279e5d>] sock_poll+0x12/0x14
Aug 27 11:50:45 bws0202 kernel: [<c016c675>] do_pollfd+0x47/0x81
Aug 27 11:50:45 bws0202 kernel: [<c016c6f9>] do_poll+0x4a/0xac
Aug 27 11:50:45 bws0202 kernel: [<c016c91d>] sys_poll+0x1c2/0x279
Aug 27 11:50:45 bws0202 kernel: [<c016bde1>] __pollwait+0x0/0x95
Aug 27 11:50:45 bws0202 kernel: [<c0126285>] sys_gettimeofday+0x53/0xac
Aug 27 11:50:45 bws0202 kernel: [<c02d6287>] syscall_call+0x7/0xb
Aug 27 11:50:45 bws0202 kernel: [<c02d007b>] unix_stream_sendmsg+0x33/0x33a
Aug 27 11:50:45 bws0202 kernel: Mem-info:
Aug 27 11:50:45 bws0202 kernel: DMA per-cpu:
...................................................................................
Aug 27 11:50:45 bws0202 kernel: Normal per-cpu:
Aug 27 11:50:45 bws0202 kernel: cpu 0 hot: low 32, high 96, batch 16
...............................................................................................................
Aug 27 11:50:46 bws0202 kernel: Free pages: 3904248kB (3890880kB HighMem)
Aug 27 11:50:46 bws0202 kernel: Active:785397 inactive:2228918 dirty:684 writeback:512 unstable:0 free:976062 slab:156542 ma
pped:192154 pagetables:1461
Aug 27 11:50:46 bws0202 kernel: DMA free:12504kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_s
canned:0 all_unreclaimable? yes
Aug 27 11:50:46 bws0202 kernel: protections[]: 0 0 0
Aug 27 11:50:46 bws0202 kernel: Normal free:864kB min:928kB low:1856kB high:2784kB active:89892kB inactive:25980kB present:9
01120kB pages_scanned:0 all_unreclaimable? no
Aug 27 11:50:46 bws0202 kernel: protections[]: 0 0 0
Aug 27 11:50:46 bws0202 kernel: HighMem free:3890880kB min:512kB low:1024kB high:1536kB active:3051644kB inactive:8889744kB
present:16646144kB pages_scanned:0 all_unreclaimable? no
Aug 27 11:50:46 bws0202 kernel: protections[]: 0 0 0
Aug 27 11:50:46 bws0202 kernel: DMA: 2*4kB 0*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 1
2504kB
Aug 27 11:50:46 bws0202 kernel: Normal: 214*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096k
B = 864kB
Aug 27 11:50:46 bws0202 kernel: HighMem: 45384*4kB 53998*8kB 35621*16kB 7967*32kB 880*64kB 814*128kB 253*256kB 94*512kB 676*
1024kB 510*2048kB 108*4096kB = 3890880kB
Aug 27 11:50:46 bws0202 kernel: Swap cache: add 56, delete 56, find 0/0, race 0+0
Aug 27 11:50:46 bws0202 kernel: 0 bounce buffer pages
Aug 27 11:50:46 bws0202 kernel: Free swap: 16776984kB
Aug 27 11:50:46 bws0202 kernel: 4390912 pages of RAM
Aug 27 11:50:46 bws0202 kernel: 3964594 pages of HIGHMEM
Aug 27 11:50:46 bws0202 kernel: 232823 reserved pages
Aug 27 11:50:46 bws0202 kernel: 2673733 pages shared
Aug 27 11:50:46 bws0202 kernel: 0 pages swap cached

The computing nodes are running "lustre-1.6.5-2.6.9_55.EL.cernsmp", with 16 GB memory on 32 bit OS. servers are running "2.6.9-67.0.22.EL_lustre.1.6.6smp" on 64 bit OS. Every computing nodes are mounting two lustre: one with 20 OSS, one with 2 OSS. I have set /proc/fs/lustre/llite/*/max_cached_mb=4158 for each Lustre file system. It seemed
that the computing nodes were runnning out of Normal memory when they were dead.
Is it possible to control the Normal Memory a Lustre client used with certain tuning option?
Our server have experienced same problem when the OS of OSSes are 32 bit. After switched to 64 bit, the problem has not appreared any more. It is difficult for us to switch all computing nodes to 64 bit right now.

Best Regards
Lu Wang
--------------------------------------------------------------
Computing Center
IHEP Office: Computing Center,123
19B Yuquan Road Tel: (+86) 10 88236012-607
P.O. Box 918-7 Fax: (+86) 10 8823 6839
Beijing 100049,China Email: Lu....@ihep.ac.cn
--------------------------------------------------------------

_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

unread,
Aug 28, 2009, 4:30:41 AM8/28/09
to Lu Wang, lustre-discuss
On Aug 28, 2009 15:22 +0800, Lu Wang wrote:
> We got a unusally frequency of computing node crash this days, after we
> add 10 more OSS to present Lustre system.
:
:

> Aug 27 11:50:46 bws0202 kernel: Normal free:864kB min:928kB low:1856kB high:2784kB active:89892kB inactive:25980kB present:9

Note very little "normal" memory is free. This is the only memory that
the kernel can use for its own caching.

> Aug 27 11:50:46 bws0202 kernel: HighMem free:3890880kB min:512kB low:1024kB high:1536kB active:3051644kB inactive:8889744kB

There is almost 4GB of highmem free, but it can't be used by kernel
allocations on 32-bit systems.

> The computing nodes are running "lustre-1.6.5-2.6.9_55.EL.cernsmp",
> with 16 GB memory on 32 bit OS. servers are running

You cannot use this memory with a 32-bit kernel. Use a 64-bit kernel
instead.

> "2.6.9-67.0.22.EL_lustre.1.6.6smp" on 64 bit OS. Every computing nodes
> are mounting two lustre: one with 20 OSS, one with 2 OSS. I have set
> /proc/fs/lustre/llite/*/max_cached_mb=4158 for each Lustre file system.

This is far too much cache for a 32-bit client.

> Is it possible to control the Normal Memory a Lustre client used with certain tuning option?

Reduce the max_cached_mb to a much smaller value (e.g. 1GB) and it
may help avoid problems.

> Our server have experienced same problem when the OS of OSSes are 32
> bit. After switched to 64 bit, the problem has not appreared any more.
> It is difficult for us to switch all computing nodes to 64 bit right now.

You can still run 32-bit applications with a 64-bit kernel, if that is
needed, as long as you also install the 32-bit userspace (libraries).
You need to install 64-bit lustre tools, but that should be fine.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Reply all
Reply to author
Forward
0 new messages