Filesystem loop errors - duplicate inodes

Trey Dockendorf

unread,

Sep 12, 2013, 5:19:13 PM9/12/13

to fhgfs...@googlegroups.com

Running a find command on a user's scratch space to get a file count:

# find . -type f -o -type l | wc -l
find: Filesystem loop detected; `./ttbar_255/SubProcesses/P2_gg_ttxgg/G20' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `./ttbar_163/SubProcesses/P3_gq_ttxqqq/G92' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.

I confirmed that those 2 directories and the directory 2 levels up do in fact have the same inode number.

# stat ttbar_255/SubProcesses/P2_gg_ttxgg/G20
File: `ttbar_255/SubProcesses/P2_gg_ttxgg/G20'
Size: 10 Blocks: 1 IO Block: 524288 directory
Device: 18h/24d Inode: 4273206490584617158 Links: 2
<snip>

Access: 2013-09-02 13:20:05.000000000 -0500
Modify: 2013-08-20 02:01:50.000000000 -0500
Change: 2013-09-02 13:20:05.000000000 -0500

# stat ttbar_255/SubProcesses
File: `ttbar_255/SubProcesses'
Size: 58 Blocks: 1 IO Block: 524288 directory
Device: 18h/24d Inode: 4273206490584617158 Links: 18

<snip>

Access: 2013-09-02 13:13:04.000000000 -0500
Modify: 2013-08-20 02:02:18.000000000 -0500
Change: 2013-09-02 13:13:04.000000000 -0500

# stat ttbar_163/SubProcesses/P3_gq_ttxqqq/G92
File: `ttbar_163/SubProcesses/P3_gq_ttxqqq/G92'
Size: 13 Blocks: 1 IO Block: 524288 directory
Device: 18h/24d Inode: 671565646234947782 Links: 2

<snip>

Access: 2013-08-31 09:24:12.000000000 -0500
Modify: 2013-09-12 12:44:13.000000000 -0500
Change: 2013-09-12 12:44:13.000000000 -0500

# stat ttbar_163/SubProcesses
File: `ttbar_163/SubProcesses'
Size: 58 Blocks: 1 IO Block: 524288 directory
Device: 18h/24d Inode: 671565646234947782 Links: 18

<snip>

Access: 2013-08-31 09:14:11.000000000 -0500
Modify: 2013-09-12 12:38:37.000000000 -0500
Change: 2013-09-12 12:38:37.000000000 -0500

The client, storage and metadata are all on 2012.10.r8. The storage and metadata are on top of ZFS. We recently used lots of concurrent rsyncs to migrate from our 2011.04 filesystem to this current 2012.10 (so we could upgrade servers as well as underlying FS from XFS to ZFS). Not sure if that process could have lead to some form of corruption. I've been running 'fhgfs-fsck --checkfs --readOnly --runOnline' for going on 3 days now. It has been on Step 3, with the last output as "Dentry-by-ID file is present, but no corresponding dentry:" for almost 2 days now.

Any advice as to what could cause this, how to fix or detect more of these would be greatly appreciated.

Thanks

- Trey

Sven Breuner

unread,

Sep 13, 2013, 4:47:51 AM9/13/13

to fhgfs...@googlegroups.com

hi trey,

Trey Dockendorf wrote on 09/12/2013 11:19 PM:
> # find . -type f -o -type l | wc -l
> find: Filesystem loop detected;
> `./ttbar_255/SubProcesses/P2_gg_ttxgg/G20' has the same device number
> and inode as a directory which is 2 levels higher in the filesystem
> hierarchy.
> find: Filesystem loop detected;
> `./ttbar_163/SubProcesses/P3_gq_ttxqqq/G92' has the same device number
> and inode as a directory which is 2 levels higher in the filesystem
> hierarchy.

let me start right away with the good news: what you're seeing here are
just hash collisions of hashes that are dynamically generated by the
client. no user data is broken or corrupt.

fhgfs internally uses inode IDs that are longer than 64bits (you can see
them when you use e.g. "fhgfs-ctl --getentryinfo /mnt/fhgfs/myfile").
while the linux kernel supports longer IDs internally for quite a while
already, it's still not possible to get these IDs through to userspace
and thus tools like "find" and "du" still try to rely on the old inode
numbers. so to generate 64bit inode numbers from something that is
actually longer than 64bits, the client hashes the actual fhgfs entry ID.

the hash algo that we used for this seems to generate a surprisingly
high number of collisions recently, so we been already working on
switching to a more appropriate one.
this new one will become default with the next release, but you can
switch to it with r8 already by adding this line to fhgfs-client.conf:
sysInodeIDStyle=md4hash64

...as the inode numbers are generated by the client on-the-fly (and not
stored on disk) when a file is accessed, you will then only need to
remount the client.

would be great if you could confirm afterwards that you don't see any
more collisions.

> I've
> been running 'fhgfs-fsck --checkfs --readOnly --runOnline' for going on
> 3 days now. It has been on Step 3, with the last output as
> "Dentry-by-ID file is present, but no corresponding dentry:" for almost
> 2 days now.

regarding the long runtime: i'm still waiting for reply from my collegue
who worked on this, so i can't say for sure right now, but i know that
there was a step in fhgfs-fsck that was handled surprisingly inefficient
by the underlying database. this statement was already replaced by a
much faster one, but it seems this patch wasn't submitted for final
release testing yet. (so expect fsck runtime to improve with one of the
next releases.)

best regards,
sven

Trey Dockendorf

unread,

Sep 17, 2013, 12:31:45 PM9/17/13

to fhgfs...@googlegroups.com, sven.b...@itwm.fraunhofer.de

Thanks for reply and confirming this isn't some form of corruption. Additional replies inline.

On Friday, September 13, 2013 3:47:51 AM UTC-5, Sven Breuner wrote:

let me start right away with the good news: what you're seeing here are
just hash collisions of hashes that are dynamically generated by the
client. no user data is broken or corrupt.

fhgfs internally uses inode IDs that are longer than 64bits (you can see
them when you use e.g. "fhgfs-ctl --getentryinfo /mnt/fhgfs/myfile").
while the linux kernel supports longer IDs internally for quite a while
already, it's still not possible to get these IDs through to userspace
and thus tools like "find" and "du" still try to rely on the old inode
numbers. so to generate 64bit inode numbers from something that is
actually longer than 64bits, the client hashes the actual fhgfs entry ID.

the hash algo that we used for this seems to generate a surprisingly
high number of collisions recently, so we been already working on
switching to a more appropriate one.
this new one will become default with the next release, but you can
switch to it with r8 already by adding this line to fhgfs-client.conf:
sysInodeIDStyle=md4hash64

...as the inode numbers are generated by the client on-the-fly (and not
stored on disk) when a file is accessed, you will then only need to
remount the client.

would be great if you could confirm afterwards that you don't see any
more collisions.

Right now the filesystem is being used but I'll offline a compute node and confirm other inode collisions I found still exist and then set 'sysInodeIDStyle=md4hash64' and see if anything changes.

Is the item in getentryinfo that's hashed the 'EntryID'? If so, what's the current hashing algorithm? Would it be possible to detect possible collisions by looking at the metadata inodes directory and running each path through the algorithm to see if collisions would occur without running a 'find' on my mounted FhGFS fs? The scripted algorithm test may be faster as I'm having issues with latency on metadata and a find on ~70 million files takes a very long time. (thread coming about latency , no need to address here).

> I've
> been running 'fhgfs-fsck --checkfs --readOnly --runOnline' for going on
> 3 days now. It has been on Step 3, with the last output as
> "Dentry-by-ID file is present, but no corresponding dentry:" for almost
> 2 days now.

regarding the long runtime: i'm still waiting for reply from my collegue
who worked on this, so i can't say for sure right now, but i know that
there was a step in fhgfs-fsck that was handled surprisingly inefficient
by the underlying database. this statement was already replaced by a
much faster one, but it seems this patch wasn't submitted for final
release testing yet. (so expect fsck runtime to improve with one of the
next releases.)

Thanks, the initial scan had to be resumed, but looks like it took ~3 days (much faster than 2011.04, which never finished after over a week!). It may be worth putting into a feature request that the logs for fhgfs-fsck put Date+Time as I observed it only puts the time.

Example:

(3) 16:21:19 Main [DGramLis] >> Listening for UDP datagrams: Port 44839
(3) 16:21:19 Main [App] >> Version: 2012.10-r8
(3) 16:21:19 Main [App] >> LocalNode: <snip>
(3) 16:21:19 Main [App] >> Usable NICs: ib0(RDMA) ib0(TCP) eth0(TCP)

The output log shows errors were found. I have never had to run a repair thus far so any advice on how they occurred, would be helpful. Also are these repairable errors?

FhGFS File System Check Version : 2012.10-r8
----

--------------------------------------------------------------------
Started FhGFS fsck in forward check mode [Thu Sep 12 16:21:19 2013]
Log will be written to /var/log/fhgfs-fsck.log
Database will be saved as /var/lib/fhgfs/fhgfs-fsck.db
--------------------------------------------------------------------

Step 1: Check reachability of nodes: Finished
Step 3: Check for errors...

Target is used, but does not exist: Finished
File has a missing target in stripe pattern: Finished
Dentry-by-ID file is present, but no corresponding dentry: Finished
Dentry-by-ID file is broken or missing: Finished
Wrong owner node saved in inode: Finished
Dentry points to Inode on wrong node: Finished
Content directory without an inode: Finished
Dir Inode without an dentry pointing to it (orphaned inode): Finished
File Inode without an dentry pointing to it (orphaned inode): Finished
Chunk without an inode pointing to it (orphaned chunk): Finished
Found 9 errors.

Chunk ID: 2A036-521B90C2-1; # of targets: 1
Chunk ID: 2A037-521B90C2-1; # of targets: 1
Chunk ID: 2A038-521B90C2-1; # of targets: 1
Chunk ID: 2BF0A2-51FC5D3D-1; # of targets: 2
Chunk ID: 2BF0A3-51FC5D3D-1; # of targets: 2
Chunk ID: 2BF0A4-51FC5D3D-1; # of targets: 2

Repairing now...

Dangling directory entry: Finished
Directory Inode without a content directory: Finished
Attributes of file inode are wrong: Finished
Attributes of dir inode are wrong: Finished
Found 2 errors.

Directory ID: 20C942-51FC5D3D-1
Directory ID: 296E5-521B90C2-1

Repairing now...

Found 11 errors.