Hi
We have recently experienced 2 crashes in the metadata service separated by 1 week after upgrading the estate to 7.4.2.
We have 2 metadata services on different servers, both have crashed once each.
The error message reports a similar but not identical file name each time. Neither file exists on the file system as it is currently.
I suspected something to do with hard links. All 8000 files on the system with a similar name to those in the error message (kmers_raw*) have a hard link count of 1. We have not checked for and migrated "old style" hard links if that turns out to be relevant.
The error messages from the metadata server are below my signature. Has this been seen before? Any hints as to how we might take some steps to debug this greatly appreciated.
(2) May30 07:45:57 Worker7 [FileInode (store updated Inode)] >> Failed to
write inlined inode: parentID: 44-665814BE-1 entryID: 1D-665816F
4-1 fileName: kmers_raw_LH11Hc.0 Error: Internal error
(0) May30 14:07:16 Worker24 [PThread.cpp:99] >> Received a SIGSEGV. Trying to
shut down...
(1) May30 14:07:16 Worker24 [PThread::signalHandler] >> Backtrace:
1: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread13signalHandlerEi+0x47) [0x73fef7]
2: /lib64/libc.so.6(+0x54df0) [0x7f8cf6654df0]
3: /opt/beegfs/sbin/beegfs-meta(_ZN14MsgHelperClose9closeFileE9NumericIDIj12NumNodeIDTagERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESa
IcEEEP9EntryInfoijPbSD_PSt6vectorI18DynamicFileAttribsSaISF_EEP18MirroredTimestamps+0x215)
[0x58ad65]
4: /opt/beegfs/sbin/beegfs-meta(_ZN14CloseFileMsgEx16closeFilePrimaryERN10NetMessage15ResponseContextE+0x2d4)
[0x5cd174]
5: /opt/beegfs/sbin/beegfs-meta(_ZN15MirroredMessageI12CloseFileMsg10FileIDLockE15processIncomingERN10NetMessage15ResponseContextE+0x524)
[0x5cfb64]
6: /opt/beegfs/sbin/beegfs-meta(_ZN27IncomingPreprocessedMsgWork7processEPcjS0_j+0x180)
[0x74f900]
7: /opt/beegfs/sbin/beegfs-meta(_ZN6Worker8workLoopE13QueueWorkType+0x146)
[0x749c16]
8: /opt/beegfs/sbin/beegfs-meta(_ZN6Worker3runEv+0x58) [0x74a2d8]
9: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread9runStaticEPv+0x11c) [0x4f6d8c]
10: /lib64/libc.so.6(+0x9f802) [0x7f8cf669f802]
11: /lib64/libc.so.6(+0x3f450) [0x7f8cf663f450]
(0) May30 14:07:16 Worker24 [App (component exception handler)] >> The
component [Worker24] encountered an unrecoverable error. [SysErr: N
o such file or directory] Exception message: Segmentation fault
(2) May30 14:07:16 Worker24 [App (component exception handler)] >> Shutting
down...
(3) May30 14:07:17 Main [App] >> Stored 2 sessions and 0 mirrored sessions
###############################################################
(2) Jun04 23:24:26 Worker18 [FileInode (store updated Inode)] >> Failed to
write inlined inode: parentID: 0-665F92B0-1 entryID: F4-665F928A-2 fileName: k
mers_raw_Y1YsA9.6 Error: Internal error
(0) Jun05 10:17:03 Worker7 [PThread.cpp:99] >> Received a SIGSEGV. Trying to
shut down...
(1) Jun05 10:17:03 Worker7 [PThread::signalHandler] >> Backtrace:
1: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread13signalHandlerEi+0x47) [0x73fef7]
2: /lib64/libc.so.6(+0x54df0) [0x7f4351054df0]
3: /opt/beegfs/sbin/beegfs-meta(_ZN14MsgHelperClose24closeChunkFileSequentialE9NumericIDIj12NumNodeIDTagERKNSt7__cxx1112basic_stringIcSt11char_traitsIcES
aIcEEEiR9FileInodeP9EntryInfojPSt6vectorI18DynamicFileAttribsSaISG_EE+0xb5)
[0x587e85]
4: /opt/beegfs/sbin/beegfs-meta(_ZN14MsgHelperClose9closeFileE9NumericIDIj12NumNodeIDTagERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP9EntryInf
oijPbSD_PSt6vectorI18DynamicFileAttribsSaISF_EEP18MirroredTimestamps+0x240)
[0x58ad90]
5: /opt/beegfs/sbin/beegfs-meta(_ZN14CloseFileMsgEx16closeFilePrimaryERN10NetMessage15ResponseContextE+0x2d4)
[0x5cd174]
6: /opt/beegfs/sbin/beegfs-meta(_ZN15MirroredMessageI12CloseFileMsg10FileIDLockE15processIncomingERN10NetMessage15ResponseContextE+0x524)
[0x5cfb64]
7: /opt/beegfs/sbin/beegfs-meta(_ZN27IncomingPreprocessedMsgWork7processEPcjS0_j+0x180)
[0x74f900]
8: /opt/beegfs/sbin/beegfs-meta(_ZN6Worker8workLoopE13QueueWorkType+0x146)
[0x749c16]
9: /opt/beegfs/sbin/beegfs-meta(_ZN6Worker3runEv+0x58) [0x74a2d8]
10: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread9runStaticEPv+0x11c) [0x4f6d8c]
11: /lib64/libc.so.6(+0x9f802) [0x7f435109f802]
12: /lib64/libc.so.6(+0x3f450) [0x7f435103f450]
(0) Jun05 10:17:03 Worker7 [App (component exception handler)] >> The
component [Worker7] encountered an unrecoverable error. [SysErr: No such file
or directory] Exception message: Segmentation fault
(2) Jun05 10:17:03 Worker7 [App (component exception handler)] >> Shutting
down...
(3) Jun05 10:17:04 Main [App] >> Stored 7 sessions and 0 mirrored sessions