some files are listed twice by ls in directory containing 1 million files

53 views
Skip to first unread message

Christian Iseli

unread,
Jan 16, 2012, 6:09:19 AM1/16/12
to fhgfs...@googlegroups.com
Hello,

I'm currently performing some stress tests on fhgfs.  The setup is the following:

Node 'fhgfs':
 - RAID10 of 6 SSD disks of 64G = 157 G usable space for metadata, formatted according to recommended ext4 and options
 - running admon, mgmtd and meta daemons

Nodes 'fhgfs01' and 'fhgfs02':
 - each 4 RAID6 LUNs of 6 disks of 1T, located in a (somewhat old) xyratex unit (2 controllers, each with 2 SAN ports), formatted as XFS with external journal
 - running storage daemon
 - the total usable space is reported as 29T

Nodes 'fhgfs03' and 'fhgfs04':
 - running helperd and client daemons

From fhgfs04 I have created 1 directory named test1Mfiles in the fhgfs filesystem (/scratch/fhgfs) and populated this directory with 1 million files of 4 megabytes.  The data in each file is randomly generated and the sha1 sum of the content of the file is used as the file name, for checking purposes.

Creating the 1M files took a bit more than 1 day and 5 hours:
[root@fhgfs04 ~ ] tail -1 /scratch/fhgfs/ls_lt_20120114_1840.txt
-rw-r--r-- 1 root root 4194304 Jan 13 08:16 f3f14f413d0dbefbea20a1b3c594d0498ce01f35.bin
[root@fhgfs04 ~ ] head -2 /scratch/fhgfs/ls_lt_20120114_1840.txt
total 4096012288
-rw-r--r-- 1 root root 4194304 Jan 14 13:42 282bd46266151915782f1495ecb39a281145ec3f.bin

The problem is that ls shows 1000003 files instead of 1000000 as it should, and the reason is that it reports 3 files twice:
[root@fhgfs04 ~ ] uniq -d /scratch/fhgfs/ls_lt_20120114_1840.txt
-rw-r--r-- 1 root root 4194304 Jan 14 04:29 9e4cbc74b7f679cd001d67e4a02bed11a7f0c89f.bin
-rw-r--r-- 1 root root 4194304 Jan 14 04:26 6342e56a2a0ccf3114b8f926dc356faea5f750a6.bin
-rw-r--r-- 1 root root 4194304 Jan 14 02:04 f49140517dfad2d1ab278571ff088c709b36c9be.bin

I then went on to read each file, compare that its sha1sum matches its name and remove it.  All went fine.  Removing once the above 3 files made both entries disappear.  This process took a bit less than 21 hours:
Started at Sun Jan 15 13:23:06 CET 2012
Done at Mon Jan 16 10:07:55 CET 2012

Doing an ls -lt on the 1 million files took 10 minutes.  Doing a simple ls took 24 seconds.  Is it expected to see such a huge difference when requesting more info from ls ?

Bernd Schubert

unread,
Jan 19, 2012, 11:04:28 AM1/19/12
to fhgfs...@googlegroups.com
Hello Christian,

On 01/16/2012 12:09 PM, Christian Iseli wrote:
> Hello,
>

[...]

>
> From fhgfs04 I have created 1 directory named test1Mfiles in the fhgfs
> filesystem (/scratch/fhgfs) and populated this directory with 1 million
> files of 4 megabytes. The data in each file is randomly generated and
> the sha1 sum of the content of the file is used as the file name, for
> checking purposes.
>
> Creating the 1M files took a bit more than 1 day and 5 hours:

you probably can speed up this by setting in your

/etc/fhgfs/fhgfs-client-autobuild.conf:

buildArgs=-j8 FHGFS_OPENTK_IBVERBS=1 FHGFS_INTENT=1


As the intent-feature adds some incompatibilities within the 2011.04
release, we do not enable it by default. However, if you do have recent
versions (at least 2011.04-r5) on your server and client side installed,
you can enable. The upcoming 2012 release will have enabled it default.


> [root@fhgfs04 ~ ] tail -1 /scratch/fhgfs/ls_lt_20120114_1840.txt
> -rw-r--r-- 1 root root 4194304 Jan 13 08:16
> f3f14f413d0dbefbea20a1b3c594d0498ce01f35.bin
> [root@fhgfs04 ~ ] head -2 /scratch/fhgfs/ls_lt_20120114_1840.txt
> total 4096012288
> -rw-r--r-- 1 root root 4194304 Jan 14 13:42
> 282bd46266151915782f1495ecb39a281145ec3f.bin
>
> The problem is that ls shows 1000003 files instead of 1000000 as it
> should, and the reason is that it reports 3 files twice:
> [root@fhgfs04 ~ ] uniq -d /scratch/fhgfs/ls_lt_20120114_1840.txt
> -rw-r--r-- 1 root root 4194304 Jan 14 04:29
> 9e4cbc74b7f679cd001d67e4a02bed11a7f0c89f.bin
> -rw-r--r-- 1 root root 4194304 Jan 14 04:26
> 6342e56a2a0ccf3114b8f926dc356faea5f750a6.bin
> -rw-r--r-- 1 root root 4194304 Jan 14 02:04
> f49140517dfad2d1ab278571ff088c709b36c9be.bin
>
> I then went on to read each file, compare that its sha1sum matches its
> name and remove it. All went fine. Removing once the above 3 files made
> both entries disappear. This process took a bit less than 21 hours:
> Started at Sun Jan 15 13:23:06 CET 2012
> Done at Mon Jan 16 10:07:55 CET 2012

Yes, this is a known problem, unfortunately not on our side. The
underlying issue is from ext4, which does not behave as posix requires.
We have been trying to fix this in upstream kernel versions ever since
summer. Unfortunately, the ext4 maintainer ever since has simply ignored
us. Only recently and on request of several other groups suffering from
the very same problem, he told that he was too busy to review the
patches, so far. Hopefully the patches will go in into linux-3.4, which
then also would allow that RedHat would take over it into their
enterprise kernel versions.

http://comments.gmane.org/gmane.linux.file-systems/60157


>
> Doing an ls -lt on the 1 million files took 10 minutes. Doing a simple
> ls took 24 seconds. Is it expected to see such a huge difference when
> requesting more info from ls ?

A simple 'ls' translates into a readdir(), which only needs to list
directory entries. Doing an 'ls -l' requires to call an additional
stat() for each and every file found by readdir(). As this is an
operation over network, this takes some time, of course. In order to
work around another ext4 issue, we are also working on am improved meta
data layout, which hopefully also will speed up 'ls -l' a bit.


Hope it helps,
Bernd

Reply all
Reply to author
Forward
0 new messages