Beegfs FSCK error

270 views
Skip to first unread message

dwakefi2.gmu

unread,
May 25, 2020, 11:20:05 AM5/25/20
to beegfs-user
HI-
Can anyone help with an error I am seeing running BeeGFS-fsck?  

I have one folder in my beegfs mount that's missing, I don't actually need the folder, but I would like this error to go away:

 ls /mnt/beegfs/
ls: cannot access /mnt/beegfs/test-no-mirror: No such file or directory
(other folders list correctly)


What I ran:
beegfs-fsck --checkfs  --ignoreDBDiskSpace --logLevel=10 --automatic --forceRestart --overwriteDbFile

Error from beegfs-fsck.log:
[App (component exception handler)] >> The component [Worker32] encountered an unrecoverable error. [SysErr: Success] Exception message: Read error occured while fetching chunks from node; nodeID: MMR01.company.xyz


Error from beegfs-storage.log on mmr01:
(2) May25 10:37:25 ChunkFetcherSlave-102 [ChunkFetcherSlave.cpp:127] >> Could not stat directory. path: /beegfs/beegfs2/buddymir/u2C16F3F4/5EBF/6; targetID: 102; sysErr: No such file or directory (2)


State info for all nodes:
TargetID     Reachability  Consistency   NodeID
========     ============  ===========   ======
       1           Online         Good   beegfs-meta MMR01.company.xyz [ID: 1]
       2           Online         Good   beegfs-meta MMR02.company.xyz [ID: 2]
       3           Online         Good   beegfs-meta MMR03.company.xyz [ID: 3]
       4           Online         Good   beegfs-meta MMR04.company.xyz [ID: 4]
       5           Online         Good   beegfs-meta MMR05.company.xyz [ID: 5]
       6           Online         Good   beegfs-meta MMR06.company.xyz [ID: 6]
TargetID     Reachability  Consistency   NodeID
========     ============  ===========   ======
     101           Online         Good   beegfs-storage MMR01.company.xyz [ID: 1]
     102           Online         Good   beegfs-storage MMR01.company.xyz [ID: 1]
     201           Online         Good   beegfs-storage MMR02.company.xyz [ID: 2]
     202           Online         Good   beegfs-storage MMR02.company.xyz [ID: 2]
     301           Online         Good   beegfs-storage MMR03.company.xyz [ID: 3]
     302           Online         Good   beegfs-storage MMR03.company.xyz [ID: 3]
     401           Online         Good   beegfs-storage MMR04.company.xyz [ID: 4]
     402           Online         Good   beegfs-storage MMR04.company.xyz [ID: 4]
     501           Online         Good   beegfs-storage MMR05.company.xyz [ID: 5]
     502           Online         Good   beegfs-storage MMR05.company.xyz [ID: 5]
     601           Online         Good   beegfs-storage MMR06.company.xyz [ID: 6]
     602           Online         Good   beegfs-storage MMR06.company.xyz [ID: 6]


 [root@MMR01 log]# beegfs-ctl --listmirrorgroups --nodetype=meta
     BuddyGroupID     PrimaryNodeID   SecondaryNodeID
     ============     =============   ===============
                1                 4                 1
                2                 6                 3
                3                 2                 5
[root@MMR01 log]# beegfs-ctl --listmirrorgroups --nodetype=storage
     BuddyGroupID   PrimaryTargetID SecondaryTargetID
     ============   =============== =================
                1               101               302
                2               201               402
                3               502               301
                4               401               602
                5               102               501
                6               202               601

James Burton

unread,
May 26, 2020, 1:22:51 PM5/26/20
to fhgfs...@googlegroups.com
You might have an issue with the underlying filesystem on your storage target.  Can you verify that  /beegfs/beegfs2/buddymir/u2C16F3F4/5EBF/6 on the storage target exists and has the correct permissions?

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/2d5fa919-cb2d-47b5-9fb0-eef2029ee767%40googlegroups.com.


--
James Burton
OS and Storage Architect
Advanced Computing Infrastructure
Clemson University Computing and Information Technology
340 Computer Court
Anderson, SC 29625

dwakefi2.gmu

unread,
May 27, 2020, 11:56:35 AM5/27/20
to beegfs-user
Every server seems to have that directory:

$$ pdsh -g beegfs 'ls -ltra /beegfs/beegfs2/buddymir/u2C16F3F4/5EBF/6/ | wc -l'
mmr01: 5957
mmr04: 5521
mmr06: 5399
mmr05: 5957
mmr02: 6357
mmr03: 6400

And all files in that directory have exactly the same permissions (exclude files that match permissions and owners that match):
$$ pdsh -g beegfs 'ls -ltra /beegfs/beegfs2/buddymir/u2C16F3F4/5EBF/6 | grep -v "drwxrwxrwx    2 root root"'
mmr06: total 1280
mmr06: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr06: drwxrwxrwx 5398 root root 196608 May 25 13:38 .
mmr05: total 1316
mmr05: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr05: drwxrwxrwx 5956 root root 200704 May 25 13:38 .
mmr04: total 1280
mmr04: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr04: drwxrwxrwx 5520 root root 200704 May 25 13:39 .
mmr02: total 1416
mmr02: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr02: drwxrwxrwx 6356 root root 208896 May 25 13:38 .
mmr03: total 1420
mmr03: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr03: drwxrwxrwx 6399 root root 208896 May 25 18:37 .
mmr01: total 1320
mmr01: drwxrwxrwx    8 root root     88 May 16 00:46 ..
mmr01: drwxrwxrwx 5956 root root 196608 May 25 13:38 .


Any other ideas?
Thanks!



On Tuesday, May 26, 2020 at 1:22:51 PM UTC-4, James Burton wrote:
You might have an issue with the underlying filesystem on your storage target.  Can you verify that  /beegfs/beegfs2/buddymir/u2C16F3F4/5EBF/6 on the storage target exists and has the correct permissions?

To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages