Yesterday our meta server wedged and had to be rebooted. It had gone offline and could not be logged into. After rebooting it won't start. I tried increasing FDlimit to 100k and I disconnected or powered down all clients to make sure there wasn't a connection hanging it. Here is what the meta log says:
(3) Mar23 21:54:45 Main [App] >> Root directory loaded.
(1) Mar23 21:54:45 Main [App] >> Root metadata server (by possession of root directory): 1
(3) Mar23 21:54:45 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8005
(1) Mar23 21:54:45 Main [App] >> Waiting for beegfs-mgmtd@beegfs-meta1:8008...
(2) Mar23 21:54:45 RegDGramLis [Heartbeat incoming] >> New node: beegfs-mgmtd
beegfs-meta1.dfci.harvard.edu [ID: 1]
(3) Mar23 21:54:45 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8005
(2) Mar23 21:54:45 Main [Register node] >> Node registration successful.
(3) Mar23 21:54:45 Main [NodeConn (acquire stream)] >> Connected:
beegfs...@172.24.224.197:8008 (protocol: TCP)
(2) Mar23 21:54:45 Main [printSyncResults] >> Nodes added (sync results): 1 (Type: beegfs-meta)
(2) Mar23 21:54:45 Main [printSyncResults] >> Nodes added (sync results): 10 (Type: beegfs-storage)
(0) Mar23 21:54:45 Main [PThread.cpp:99] >> Received a SIGSEGV. Trying to shut down...
(1) Mar23 21:54:45 Main [PThread::signalHandler] >> Backtrace:
1: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread13signalHandlerEi+0x47) [0x755647]
2: /lib64/libc.so.6(+0x36280) [0x7f026de82280]
3: /opt/beegfs/sbin/beegfs-meta(_ZN18ExceededQuotaStore19updateExceededQuotaEPSt4listIjSaIjEE13QuotaDataType14QuotaLimitType+0x1e) [0x74be4e]
4: /opt/beegfs/sbin/beegfs-meta(_ZN15InternodeSyncer29downloadAllExceededQuotaListsESt10shared_ptrI11StoragePoolE+0x169) [0x4c3269]
5: /opt/beegfs/sbin/beegfs-meta(_ZN15InternodeSyncer29downloadAllExceededQuotaListsERKSt6vectorISt10shared_ptrI11StoragePoolESaIS3_EE+0xb2) [0x4c3c42]
6: /opt/beegfs/sbin/beegfs-meta(_ZN3App16downloadMgmtInfoER22TargetConsistencyState+0x1fa) [0x48771a]
7: /opt/beegfs/sbin/beegfs-meta(_ZN3App9runNormalEv+0x12f) [0x48c9ef]
8: /opt/beegfs/sbin/beegfs-meta(_ZN3App3runEv+0x52) [0x48cf72]
9: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread9runStaticEPv+0xfe) [0x481fee]
10: /opt/beegfs/sbin/beegfs-meta(_ZN7Program4mainEiPPc+0x49) [0x47f169]
11: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f026de6e3d5]
12: /opt/beegfs/sbin/beegfs-meta() [0x4818e5]
(0) Mar23 21:54:45 Main [PThread.cpp:135] >> Received a SIGABRT. Trying to shut down...
(1) Mar23 21:54:45 Main [PThread::signalHandler] >> Backtrace:
1: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread13signalHandlerEi+0x47) [0x755647]
2: /lib64/libc.so.6(+0x36280) [0x7f026de82280]
3: /lib64/libc.so.6(gsignal+0x37) [0x7f026de82207]
4: /lib64/libc.so.6(abort+0x148) [0x7f026de838f8]
5: /lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165) [0x7f026e9ad7d5]
6: /lib64/libstdc++.so.6(+0x5e746) [0x7f026e9ab746]
7: /lib64/libstdc++.so.6(+0x5d6f9) [0x7f026e9aa6f9]
8: /lib64/libstdc++.so.6(__gxx_personality_v0+0x564) [0x7f026e9ab364]
9: /lib64/libgcc_s.so.1(+0xf8a3) [0x7f026e4448a3]
10: /lib64/libgcc_s.so.1(_Unwind_RaiseException+0xfb) [0x7f026e444c3b]
11: /lib64/libstdc++.so.6(__cxa_throw+0x66) [0x7f026e9ab986]
12: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread13signalHandlerEi+0x296) [0x755896]
13: /lib64/libc.so.6(+0x36280) [0x7f026de82280]
14: /opt/beegfs/sbin/beegfs-meta(_ZN18ExceededQuotaStore19updateExceededQuotaEPSt4listIjSaIjEE13QuotaDataType14QuotaLimitType+0x1e) [0x74be4e]
15: /opt/beegfs/sbin/beegfs-meta(_ZN15InternodeSyncer29downloadAllExceededQuotaListsESt10shared_ptrI11StoragePoolE+0x169) [0x4c3269]
16: /opt/beegfs/sbin/beegfs-meta(_ZN15InternodeSyncer29downloadAllExceededQuotaListsERKSt6vectorISt10shared_ptrI11StoragePoolESaIS3_EE+0xb2) [0x4c3c42]
17: /opt/beegfs/sbin/beegfs-meta(_ZN3App16downloadMgmtInfoER22TargetConsistencyState+0x1fa) [0x48771a]
18: /opt/beegfs/sbin/beegfs-meta(_ZN3App9runNormalEv+0x12f) [0x48c9ef]
19: /opt/beegfs/sbin/beegfs-meta(_ZN3App3runEv+0x52) [0x48cf72]
20: /opt/beegfs/sbin/beegfs-meta(_ZN7PThread9runStaticEPv+0xfe) [0x481fee]
21: /opt/beegfs/sbin/beegfs-meta(_ZN7Program4mainEiPPc+0x49) [0x47f169]
22: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f026de6e3d5]
23: /opt/beegfs/sbin/beegfs-meta() [0x4818e5]
I contacted thinkparq (we have a contract with them) but nothing back yet.
I suspect meta data corruption but I am not sure how to fix it.