How to deal with inconsistencies after crash

119 views
Skip to first unread message

Michael Krause

unread,
Feb 27, 2019, 4:34:41 AM2/27/19
to beegfs-user
Hey everyone,

I'm very new to beegfs (7.1 on Debian Stable, backports kernel 4.19)
and I already managed to create an inconsistency that I'm unable to fix
on my own :)
During a test one of the file servers (no buddy mirroring) crashed and a
a specific directory now contains a lot of entries that I'm unable to
delete:

**shell**

rm -rf dir
rm: cannot remove 'dir': Remote I/O error


**beegfs-meta.log**


DirectWorker1 [Directory (remove contents dir)] >> Unable to delete
dirEntryID directory: dentries/61/57/42-5C6737A5-3/#fSiDs#/. SysErr:
Directory not empty


During **beegfs-fsck** I get a lot of:


* Checking: Dentry-by-ID file is present, but no corresponding dentry ...
> Entry ID: 3D224-5C6737BA-4; Path: [<unresolved>]; Node: 4

and I try to fix it with option 4 ( Recreate directory entry file (apply
for all) ).

During fsck beegfs-meta.log then shows a lot of:


[DirEntry (load from xattr file)] >> Found an empty dir-entry file.
(Self-healing through file removal):
dentries/8/7E/B2-5C6737A5-3/3D9A7-5C6737BA-4
[RecreateDentriesMsgEx] >> Could not read the created dentry file;
ParentID: B2-5C6737A5-3; ID: 3D9A7-5C6737BA-4




I'm not really sure how to proceed at this point and I'd appreciate any
suggestions.

cheers
Michael

James Burton

unread,
Feb 27, 2019, 8:56:49 AM2/27/19
to fhgfs...@googlegroups.com
This looks like a possible hardware failure. Are you seeing any I/O errors in your system logs on the server?

Jim Burton

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
James Burton
OS and Storage Architect
Advanced Computing Infrastructure
Clemson University Computing and Information Technology
340 Computer Court
Anderson, SC 29625

Michael Krause

unread,
Feb 27, 2019, 9:07:20 AM2/27/19
to fhgfs...@googlegroups.com
Hey Jim!

Not really, no. Every zpool is healthy and there are no hardware related
entries in the logs.

I should add that these beegfs errors are limited only to the one
directory (and subtree) that had been very active during the crash of
that one storage/meta node (same machine).

And, I don't expect to recover those directories, I just want to get rid
of them and clear the inconsistency.


Michael

James Burton

unread,
Feb 27, 2019, 9:29:15 AM2/27/19
to fhgfs...@googlegroups.com
This error message means that when beegfs tried to read the dentry file from the filesystem, the filesystem returned an error.

Whatever is wrong appears to be at the filesystem (zfs) level. Have you scrubbed your zpool recently? 

[RecreateDentriesMsgEx] >> Could not read the created dentry file;
ParentID: B2-5C6737A5-3; ID: 3D9A7-5C6737BA-4


--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jérémie Sebban

unread,
Feb 28, 2019, 6:05:39 AM2/28/19
to beegfs-user
Hello Michael,

I had similar errors on a much older version of Fhgfs (2014.01) showing up when I was trying to delete some folders. 
This looked to have been triggered by or shortage of inodes on the metadata server much before - a few months- the deletion errors happened. What is the status of your metadata servers ? beegfs-df will show the ratio of free inodes for each metadata target. Checking the underlying meta storage might be worth too.

To fix it, I first moved all "locked" directories to a temporary folder and then ran fhgfs-fsck  with repair. I could then delete them.
Reply all
Reply to author
Forward
0 new messages