[Lustre-discuss] Lustre client memory usage very high

Guillaume Demillecamps

unread,

Jul 22, 2009, 5:45:49 AM7/22/09

to lustre-...@lists.lustre.org

Hello people,

Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10
SP2 with un-patched kernel on the clients. I however has put the same
kernel revision downloaded from suse.com on the clients as the version
used in the Lustre-patched MGS/MDS/OSS servers. File system is only
several GBs, with ~500000 files. All inter-connections are through TCP.

We have some “manual” replication of an active lustre file system to a
passive lustre file system. We have “sync” clients that just basically
mount both file systems and run large sync jobs from the active Lustre
to the passive Lustre. So far, so good (apart that it is quite a slow
process). However my issue is that Lustre is rising memory so high
that rsync cannot get enough RAM to finish its job before kswap kicks
in and slows things down drastically.
Up to now, I have succeeded fine-tuning things using the following
steps in my rsync script:
########
umount /opt/lustre_a
umount /opt/lustre_z
mount /opt/lustre_a
mount /opt/lustre_z
for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo
30 > $i ; done
for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ; done
echo 64 > /proc/sys/lustre/max_dirty_mb
lctl set_param ldlm.namespaces.*osc*.lru_size=100
sysctl -w lnet.debug=0
########
What I still don't understand is that even when putting a max limit of
a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the
write-cache (lru_max_age ? is it correct ?) to a very limited number,
it still sky-rise to several GBs in /proc/sys/lustre/mem_used ? And as
soon as I un-mount the disks, it drops. The memused number however
will not decrease even if the client remains idle for several days
with no i/o from/to any lustre file systems. Note that cutting the
rsync jobs in smaller but more numbered jobs is not helping. Unless
I'd start un-mounting and re-mounting the lustre file systems between
each job (which is nevertheless what I may have to plan if there is no
further parameter which would help me) !

Any help/guidance/hint/... is very much appreciated.

Thank you,

Guillaume Demillecamps
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

unread,

Jul 29, 2009, 6:46:27 PM7/29/09

to Guillaume Demillecamps, lustre-...@lists.lustre.org

Note that you can do these more easily with

lctl set_param osc.*.max_dirty_mb=4
lctl set_param ldlm.namespaces.*.lru_max_age=30
lctl set_param llite.*.max_cache_mb=64
lctl set_param max_dirty_mb=64

> lctl set_param ldlm.namespaces.*osc*.lru_size=100
> sysctl -w lnet.debug=0

This can also be "lctl set_param debug=0".

> What I still don't understand is that even when putting a max limit of
> a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the
> write-cache (lru_max_age ? is it correct ?) to a very limited number,
> it still sky-rise to several GBs in /proc/sys/lustre/mem_used ?

Can you please check /proc/slabinfo to see what kind of memory is being
allocated the most? The max_cached_mb/max_dirty_mb are only limits on
the cached/dirty data pages, and not for metadata structures. Also,
in 30s I expect you can have a LOT of inodes traversed, so that might
be your problem, and even then lock cancellation does not necessarily
force the kernel dentry/inode out of memory.

Getting total lock counts would also help:

lctl get_param ldlm.namespaces.*.resource_count

You might be able to tweak some of the "normal" (not Lustre specific)
/proc parmeters to flush the inodes from cache more quickly, or increase
the rate at which kswapd is trying to flush unused inodes.

> And as soon as I un-mount the disks, it drops. The memused number however
> will not decrease even if the client remains idle for several days
> with no i/o from/to any lustre file systems. Note that cutting the
> rsync jobs in smaller but more numbered jobs is not helping.

There is a test program called "memhog" that could force memory to be
flushed between jobs, but that is a sub-standard solution.

> Unless
> I'd start un-mounting and re-mounting the lustre file systems between
> each job (which is nevertheless what I may have to plan if there is no
> further parameter which would help me) !

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Guillaume Demillecamps

unread,

Jul 30, 2009, 3:52:42 AM7/30/09

to Andreas Dilger, lustre-...@lists.lustre.org

Hello,

First of all thank you for your time.
You can find here attached the information you asked.
If you can keep on spending some more of your time on this...
Your help is greatly appreciated !

Best regards,

Guillaume Demillecamps

----- Message de adi...@sun.com ---------
Date : Wed, 29 Jul 2009 16:46:27 -0600
De : Andreas Dilger <adi...@sun.com>
Objet : Re: [Lustre-discuss] Lustre client memory usage very high
À : Guillaume Demillecamps <guil...@multipurpose.be>
Cc : lustre-...@lists.lustre.org

----- Fin du message de adi...@sun.com -----

lustre-help.txt

Andreas Dilger

unread,

Jul 30, 2009, 6:45:47 PM7/30/09

to Guillaume Demillecamps, lustre-...@lists.lustre.org

On Jul 30, 2009 09:52 +0200, Guillaume Demillecamps wrote:
>> On Jul 22, 2009 11:45 +0200, Guillaume Demillecamps wrote:
>>> Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10
>>> SP2 with un-patched kernel on the clients. I however has put the same
>>> kernel revision downloaded from suse.com on the clients as the version
>>> used in the Lustre-patched MGS/MDS/OSS servers. File system is only
>>> several GBs, with ~500000 files. All inter-connections are through TCP.
>>>
>>> We have some “manual” replication of an active lustre file system to a
>>> passive lustre file system. We have “sync” clients that just basically
>>> mount both file systems and run large sync jobs from the active Lustre
>>> to the passive Lustre. So far, so good (apart that it is quite a slow
>>> process). However my issue is that Lustre is rising memory so high
>>> that rsync cannot get enough RAM to finish its job before kswap kicks
>>> in and slows things down drastically.

> # name <active> <total> <size> <obj/slab>: slabdata <active> <num>
> lustre_inode_cache 385652 385652 960 4 : slabdata 96413 96413
> lov_oinfo 2929548 2929548 320 12 : slabdata 244129 244129
> ldlm_locks 136262 254424 512 8 : slabdata 31803 31803
> ldlm_resources 136183 256120 384 10 : slabdata 25612 25612

This shows that we have 385k Lustre inodes, yet there are 2.9M "lov_oinfo"
structs (there should only be a single one per inode). I'm not sure
why that is happening, but that is consuming about 1GB of RAM. The 385k
inode count is reasonable, given you have 500k files, per above. There
are 136k locks, which is also fine (probably so much lower than the inode
count because of your short lock expiry time).

So, it seems like a problem of some kind, and is probably deserving of
filing a bug.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________

Guillaume Demillecamps

unread,

Jul 31, 2009, 2:56:22 AM7/31/09

to Andreas Dilger, lustre-...@lists.lustre.org

Hello again,

Not sure if it is interesting to be noted, but if I use the following
command, my memory is freed:
sync; echo 3 > /proc/sys/vm/drop_caches
What is surprising, though, is that the cache never expires (at least
it remains in memory for several days at the least).

Regards,

Guillaume Demillecamps

----- Message de adi...@sun.com ---------

Date : Thu, 30 Jul 2009 16:45:47 -0600

De : Andreas Dilger <adi...@sun.com>
Objet : Re: [Lustre-discuss] Lustre client memory usage very high
À : Guillaume Demillecamps <guil...@multipurpose.be>
Cc : lustre-...@lists.lustre.org

----- Fin du message de adi...@sun.com -----

Guillaume Demillecamps

unread,

Jul 31, 2009, 3:15:52 AM7/31/09

to lustre-...@lists.lustre.org

Hello,

All servers and clients are having Lustre 1.8, on SLES 10 SP2. Clients
use patchless kernels, using same base revision as the ones for the
patched kernel servers.
We recurrently encounter this error :

Server log :
------------
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(mds_open.c:1665:mds_close()) @@@ no handle for file close ino
5606195: cookie 0x5ed7d8c3d1299f40 req@ffff810065a60400
x1308791892785337/t0
o35->4f104403-eb03-83be-2910-2fd7cc26087c@NET_0x20000c0a84410_UUID:0/0
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc 0/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error
(-116) req@ffff810065a60400 x1308791892785337/t0
o35->4f104403-eb03-83be-2910-2fd7cc26087c@NET_0x20000c0a84410_UUID:0/0
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc -116/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(mds_open.c:1665:mds_close()) @@@ no handle for file close ino
5606200: cookie 0x5ed7d8c3d129a361 req@ffff810071b28400
x1308791892785342/t0
o35->4f104403-eb03-83be-2910-2fd7cc26087c@NET_0x20000c0a84410_UUID:0/0
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc 0/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(mds_open.c:1665:mds_close()) Skipped 4 previous similar
messages
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error
(-116) req@ffff810071b28400 x1308791892785342/t0
o35->4f104403-eb03-83be-2910-2fd7cc26087c@NET_0x20000c0a84410_UUID:0/0
lens 408/864 e 0 to 0 dl 1248927113 ref 1 fl Interpret:/0/0 rc -116/0
Jul 30 06:11:47 BEESPBESXFIL27 kernel: LustreError:
22061:0:(ldlm_lib.c:1826:target_send_reply_msg()) Skipped 4 previous
similar messages

Client log:
-----------
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: 11-0: an error
occurred while communicating with 172.16.0.55@tcp. The mds_close
operation failed with -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(file.c:114:ll_close_inode_openhandle()) inode 5606195 mdc
close failed: rc = -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(file.c:114:ll_close_inode_openhandle()) Skipped 1 previous
similar message
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(file.c:114:ll_close_inode_openhandle()) inode 5606155 mdc
close failed: rc = -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(file.c:114:ll_close_inode_openhandle()) Skipped 3 previous
similar messages
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: 11-0: an error
occurred while communicating with 172.16.0.55@tcp. The mds_close
operation failed with -116
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: Skipped 7 previous
similar messages
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(ldlm_lock.c:602:ldlm_lock_decref_internal_nolock())
ASSERTION(lock->l_writers > 0) failed
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError:
13298:0:(ldlm_lock.c:602:ldlm_lock_decref_internal_nolock()) LBUG
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
Jul 30 06:11:47 BEESPDESXAPP06 kernel: Call Trace:
<ffffffff88257aea>{:libcfs:lbug_with_loc+122}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8825fe00>{:libcfs:tracefile_init+0}
<ffffffff8835d566>{:ptlrpc:ldlm_lock_decref_internal_nolock+182}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8838533b>{:ptlrpc:ldlm_process_flock_lock+4139}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff883864ef>{:ptlrpc:ldlm_flock_completion_ast+2111}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8835f4a9>{:ptlrpc:ldlm_lock_enqueue+2169}
<ffffffff88377ca0>{:ptlrpc:ldlm_cli_enqueue_fini+2624}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff88376fd3>{:ptlrpc:ldlm_prep_elc_req+755}
<ffffffff8835bc0d>{:ptlrpc:ldlm_lock_create+2541}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8012c668>{default_wake_function+0}
<ffffffff88379ae2>{:ptlrpc:ldlm_cli_enqueue+1666}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff88523fcf>{:lustre:ll_file_flock+1407}
<ffffffff88385cb0>{:ptlrpc:ldlm_flock_completion_ast+0}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8019ae2e>{locks_remove_posix+132}
<ffffffff80147fdc>{bit_waitqueue+56}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff80190241>{flush_old_exec+2729} <ffffffff80186fc1>{__fput+355}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8018455b>{filp_close+84}
<ffffffff801360b7>{put_files_struct+107}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8010aecb>{sysret_signal+28} <ffffffff8013725c>{do_exit+684}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff80137995>{sys_exit_group+0}
<ffffffff8014083c>{get_signal_to_deliver+1394}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8010aecb>{sysret_signal+28} <ffffffff8010a19c>{do_signal+118}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8012c668>{default_wake_function+0}
<ffffffff8014b227>{do_futex+104}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff801743b2>{sys_mprotect+1742}
<ffffffff8010aecb>{sysret_signal+28}
Jul 30 06:11:47 BEESPDESXAPP06 kernel:
<ffffffff8010b14f>{ptregscall_common+103}
Jul 30 06:11:47 BEESPDESXAPP06 kernel: LustreError: dumping log to
/tmp/lustre-log.1248927107.13298
Jul 30 06:11:47 BEESPDESXAPP06 kernel: Fixing recursive fault but
reboot is needed!

Then ineed a reboot of the client is required. What does it mean ?
Could it be related to sys.timeouts and/or ldlm_timeouts too short ?

Regards,

Guillaume Demillecamps

Oleg Drokin

unread,

Aug 4, 2009, 10:39:45 PM8/4/09

to Guillaume Demillecamps, lustre-...@lists.lustre.org

Hello!

On Jul 31, 2009, at 3:15 AM, Guillaume Demillecamps wrote:
> All servers and clients are having Lustre 1.8, on SLES 10 SP2. Clients
> use patchless kernels, using same base revision as the ones for the
> patched kernel servers.
> We recurrently encounter this error :

Chances are you are hitting bug 17046.
There is a patch with a fix that would also be included in 1.8.1
release.

Bye,
Oleg

Reply all

Reply to author

Forward