fhgfs performance problem on storage device

370 views
Skip to first unread message

Bernd Melchers

unread,
Apr 25, 2012, 8:42:31 AM4/25/12
to fhgfs-user
writing single large files in the fhgfs file system result in huge
amounts of small (and very small) files (chunks) written to the
underlying
file system on the storage nodes. This degrades the performance of
fhgfs considerably. Any suggestions?
our configuration:
scientific linux 6.1 (kernel 2.6.32-131.6.1.el6.x86_64)
fhgfs 2011.04.r16-el6
rdma over qlogic infiniband

thank you

Sven Breuner

unread,
Apr 25, 2012, 10:03:33 AM4/25/12
to fhgfs...@googlegroups.com
Hi Bernd,

Bernd Melchers wrote on 04/25/2012 02:42 PM:
> writing single large files in the fhgfs file system result in huge
> amounts of small (and very small) files (chunks) written to the
> underlying file system on the storage nodes.

I'm not sure whether I understand your situation correctly. It sounds
like you have the impression that fhgfs would create multiple
chunk-files on a storage target for a single user file. That is not the
case. fhgfs creates only a single chunk file on a storage target for a
single user file.
Thus, for a single 100GB file like "/mnt/fhgfs/mybigfile" that is
striped across 4 storage targets, fhgfs will create exactly one 25GB
chunk file on each of the 4 storage targets.

So is it possible that you are not referring to multiple small chunk
files on the storage targets, but rather to multiple small IO requests
to the underlying storage device?
That could indicate a missing RAID configuration in the underlying xfs
on the storage servers, if you're using xfs there. Usually, the
corresponding xfs RAID settings are made when you run mkfs.xfs, but they
can also be applied later at mount time.
For this, you might want to check out the "suint" and "swidth" mount
parameters of xfs in the "mount" man page and you also might want to
take a look at the other xfs and storage tuning recommendations in the
fhgfs server tuning guide here:
http://www.fhgfs.com/wiki/wikka.php?wakka=ServerTuning

Hope this helps and best regards,
Sven Breuner
Fraunhofer

Bernd Melchers

unread,
Apr 25, 2012, 10:45:55 AM4/25/12
to fhgfs-user


> Thus, for a single 100GB file like "/mnt/fhgfs/mybigfile" that is
> striped across 4 storage targets, fhgfs will create exactly one 25GB
> chunk file on each of the 4 storage targets.

ok, may be my impression was misleaded by other users writing short
files at the same time
to the same device. the xfs configuration is "good" and the local xfs
performance
ist also good.

Our configuration:
- three storage nodes
- two xfs file systems per storage node
- 36 1TB SAS 7.2k disks per file system (MD1200 enclosures, Dell H800
Controller)

Problem:
locally writing to the xfs file system with dd i get 1.2-1.5GB/sec per
file system and
3-3.5 GB/sec per storage node.

But writing with x fhgfs clients to the user file system seems to be
limited by ~900 MB/sec per storage node (< 500 MB/sec per xfs file
system), where
i expected >= 3 GB/sec.
This is a large difference and i am looking for the bottleneck.

Is it possible to get the name of all chunk files
for a given user file? If all user files are equally distributed
to the storage nodes, why is there a difference in the amount of data
in the file systems and in the number of inodes:
storage01:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc 29284640768 4096847952 25187792816 14% /fhgfs/
storage1
/dev/sdd 29284640768 4748381864 24536258904 17% /fhgfs/
storage2

storage02:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc 29284640768 3718309936 25566330832 13% /fhgfs/
storage1
/dev/sdd 29284640768 4157668568 25126972200 15% /fhgfs/
storage2

storage03:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc 29284640768 3709046532 25575594236 13% /fhgfs/
storage1
/dev/sdd 29284640768 4048550512 25236090256 14% /fhgfs/
storage2

and the inodes:

storage01:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdc 5857345536 396035 5856949501 1% /fhgfs/
storage1
/dev/sdd 5857345536 1216478 5856129058 1% /fhgfs/
storage2

storage02:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdc 5857345536 398003 5856947533 1% /fhgfs/
storage1
/dev/sdd 5857345536 1213311 5856132225 1% /fhgfs/
storage2

storage03:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdc 5857345536 1213245 5856132291 1% /fhgfs/
storage1
/dev/sdd 5857345536 397627 5856947909 1% /fhgfs/
storage2

fhgfs:
Filesystem 1K-blocks Used Available Use% Mounted on
fhgfs_nodev 175707844608 24483033184 151224811424 14% /
scratch

/etc/fhgfs/fhgfs-storage.conf is:
...
tuneNumWorkers = 48
tuneWorkerBufSize = 4m
tuneFileReadSize = 128k
tuneFileWriteSize = 128k
...

and /etc/fhgfs/fhgfs-client.conf is:
...
connUseSDP = false
connUseRDMA = true
connRDMABufSize = 65536
connRDMABufNum = 160
connMaxInternodeNum = 6
...
tuneNumWorkers = 0
tunePreferredMetaFile =
tunePreferredStorageFile =
tuneFileCacheType = buffered
tuneRemoteFSync = true
tuneUseGlobalFileLocks = false
...

Sven Breuner

unread,
Apr 25, 2012, 11:07:44 AM4/25/12
to fhgfs...@googlegroups.com
Hi Bernd,

just as a short answer regarding the number of used inodes: fhgfs does
not create a chunk file on all storage targets for user files that are
smaller than the configured fhgfs chunk size, which is 512KB by default.
So it is normal that you have different numbers of files on the storage
targets.

However, since you have a support contract, I will create a support
ticket for this to further investigate any performance issues in
cooperation with your cluster vendor.

Best regards,
Sven Breuner
Fraunhofer

nesk...@gmail.com

unread,
Apr 26, 2012, 9:51:47 AM4/26/12
to fhgfs...@googlegroups.com
Hi,

This looks like client's infiniband limit?

Regards,

Bernd Melchers

unread,
Apr 26, 2012, 10:27:06 AM4/26/12
to fhgfs...@googlegroups.com
> Hi,
>
> This looks like client's infiniband limit?

noo. infiniband limit is 40 GBit/sec. PCIe limit (for ib HCA) could be 2 GB/sec.

Mit freundlichen Gr��en
Bernd Melchers

--
Archiv- und Backup-Service | fab-s...@zedat.fu-berlin.de
Freie Universit�t Berlin | Tel. +49-30-838-55905
Reply all
Reply to author
Forward
0 new messages