[Lustre-discuss] Problems with multiple lustre filesystems

5 views
Skip to first unread message

J Alejandro Medina

unread,
Sep 11, 2011, 9:09:29 PM9/11/11
to lustre-...@lists.lustre.org
Hi to all,

Our organization has recently configured two Lustre filesystems on a Linux cluster. Both filesystems are connected to the same 10GBe VLAN. We have tested both filesystems with iOzone and other benchmarking software without errors.

When copying data from one filesystem to the other we experience excessive broadcast messages. The network crawls down to its knees until both filesystems stop responding. 

If we test both filesystems separately we do not see this behavior.

Any ideas?
--
J. Alejandro Medina

THIELL Stephane

unread,
Sep 13, 2011, 12:20:48 PM9/13/11
to J Alejandro Medina, lustre-...@lists.lustre.org
J Alejandro Medina a écrit :

> When copying data from one filesystem to the other we experience
> excessive broadcast messages. The network crawls down to its knees
> until both filesystems stop responding.
>
> If we test both filesystems separately we do not see this behavior.
An idea could be to reduce your lustre client max_cached_mb value
(per-filesystem value). By default, it is set to 2/3 of available system
memory so it's not optimal when mounting multiple lustre filesystems on
the same node, especially when copying data from one to the other.

see /proc/fs/lustre/llite/*/max_cached_mb

HTH,
Stephane Thiell
CEA


_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

gregoir...@bull.net

unread,
Sep 14, 2011, 3:10:32 AM9/14/11
to THIELL Stephane, J Alejandro Medina, lustre-...@lists.lustre.org

> De : THIELL Stephane <stephan...@cea.fr>

>
> J Alejandro Medina a écrit :
> > When copying data from one filesystem to the other we experience
> > excessive broadcast messages. The network crawls down to its knees
> > until both filesystems stop responding.
> >
> > If we test both filesystems separately we do not see this behavior.
> An idea could be to reduce your lustre client max_cached_mb value
> (per-filesystem value). By default, it is set to 2/3 of available system
> memory so it's not optimal when mounting multiple lustre filesystems on
> the same node, especially when copying data from one to the other.
>
> see /proc/fs/lustre/llite/*/max_cached_mb
>


Looking at the code (lustre 2.0) it appears the max_cached_mb tunable has no effect.
I have found LU-141 "port lustre client page cache shrinker back to clio" that tracks the problem.

--
Grégoire PICHON
Bull

Reply all
Reply to author
Forward
0 new messages