[Lustre-discuss] OSS Cache Size for read optimization

Jordan Mendler

unread,

Apr 2, 2009, 6:17:59 PM4/2/09

to lustre-...@lists.lustre.org

Hi all,

I deployed Lustre on some legacy hardware and as a result my (4) OSS's each have 32GB of RAM. Our workflow is such that we are frequently rereading the same 15GB indexes over and over again from Lustre (they are striped across all OSS's) by all nodes on our cluster. As such, is there any way to increase the amount of memory that either Lustre or the Linux kernel uses to cache files read from disk by the OSS's? This would allow much of the indexes to be served from memory on the OSS's rather than disk.

I see a lustre.memused_max = 48140176 parameter, but not sure what that does. If it matters, my setup is such that each of the 4 OSS's serves 1 OST that consists of a software RAID10 across 4 SATA disks internal to that OSS.

Any other suggestions for tuning for fast reads of large files would also be greatly appreciated.

Thanks so much,
Jordan

Cliff White

unread,

Apr 3, 2009, 2:44:18 PM4/3/09

to Jordan Mendler, lustre-...@lists.lustre.org

Jordan Mendler wrote:
> Hi all,
>
> I deployed Lustre on some legacy hardware and as a result my (4) OSS's
> each have 32GB of RAM. Our workflow is such that we are frequently
> rereading the same 15GB indexes over and over again from Lustre (they
> are striped across all OSS's) by all nodes on our cluster. As such, is
> there any way to increase the amount of memory that either Lustre or the
> Linux kernel uses to cache files read from disk by the OSS's? This would
> allow much of the indexes to be served from memory on the OSS's rather
> than disk.
>

> I see a /lustre.memused_max = 48140176/ parameter, but not sure what

> that does. If it matters, my setup is such that each of the 4 OSS's
> serves 1 OST that consists of a software RAID10 across 4 SATA disks
> internal to that OSS.
>
> Any other suggestions for tuning for fast reads of large files would
> also be greatly appreciated.
>

Current Lustre does not cache on OSTs at all. All IO is direct.
Future Lustre releases will provide an OST cache.

For now, you can increase the amount of data cached on clients, which
might help a little. Client caching is set with
/proc/fs/lustre/osc/*/max_dirty_mb.

cliffw

> Thanks so much,
> Jordan
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-...@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lundgren, Andrew

unread,

Apr 3, 2009, 2:52:56 PM4/3/09

to Cliff White, lustre-...@lists.lustre.org

The parameter is called dirty, is that write cache, or is it read-write?

>
> Current Lustre does not cache on OSTs at all. All IO is direct.
> Future Lustre releases will provide an OST cache.
>
> For now, you can increase the amount of data cached on clients, which
> might help a little. Client caching is set with
> /proc/fs/lustre/osc/*/max_dirty_mb.
>

Oleg Drokin

unread,

Apr 3, 2009, 3:31:19 PM4/3/09

to Lundgren, Andrew, Lustre User Discussion Mailing List

Yes, it is for dirty cache limiting on a per-osc basis.
There is also /proc/fs/lustre/llite/*/max_cached_mb that regulates how
much cached
data per client you can have. (default is 3/4 of RAM)

Andreas Dilger

unread,

Apr 6, 2009, 2:09:36 AM4/6/09

to Jordan Mendler, lustre-...@lists.lustre.org

On Apr 02, 2009 15:17 -0700, Jordan Mendler wrote:
> I deployed Lustre on some legacy hardware and as a result my (4) OSS's each
> have 32GB of RAM. Our workflow is such that we are frequently rereading the
> same 15GB indexes over and over again from Lustre (they are striped across
> all OSS's) by all nodes on our cluster. As such, is there any way to
> increase the amount of memory that either Lustre or the Linux kernel uses to
> cache files read from disk by the OSS's? This would allow much of the
> indexes to be served from memory on the OSS's rather than disk.

With Lustre 1.8.0 (in late release testing, you could grab
v1_8_0_RC5 from CVS for testing[*]) there is OSS server-side caching
of read and just-written data. There is a tunable that allows
limiting the maximum file size that is cached on the OSS so that
small files can be cached, and large files will not wipe out the
read cache, /proc/fs/lustre/obdfilter/*/readcache_max_filesize.

Set readcache_max_filesize just large enough to hold your index files
(which are hopefully not too large individually) to maximize your
cache retention. While the cache eviction is LRU, it may be that
at high IO rates your working set would still be evicted from RAM
if too many other files fall within the cache file size limit.

[*] Note that v1_8_0_RC5 is missing the fix for bug 18659 so is not at
all safe to use on the MDS, v1_8_0_RC6 will have that fix, as does b1_8.

> I see a *lustre.memused_max = 48140176* parameter, but not sure what that

> does. If it matters, my setup is such that each of the 4 OSS's serves 1 OST
> that consists of a software RAID10 across 4 SATA disks internal to that OSS.

That is just reporting the total amount of RAM ever used by the
Lustre code itself (48MB in this case), and has nothing to do with
the cached data.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Reply all

Reply to author

Forward