Thank you very much for receiving a reply.
I used command like "beegfs-ctl --listtargets --nodetype=meta --state" to check the status of meta service, It seems that the meta of master01 is good:
TargetID Reachability Consistency NodeID
===== ======== ========= ======
201 Online Good 201
202 Online Need-resync 202
I have also used "telnet or nc" to check with each other that tcp 8005 port is reachable.
Additionally, I got some logs from meta of master01 before "resync job" like this:
(2) Apr12 18:14:19 CommSlave55 [MessagingTk.cpp:445] >> Unable to connect, is the node offline? node: beegfs-storage node01 [ID: 204]; Message type: GetChunkFileAttribs (2017)
(2) Apr12 18:14:19 CommSlave55 [Stat chunk file work] >> Communication with storage target failed. TargetID: 2004; EntryID: 17-639FDADD-C9
(2) Apr12 18:14:19 Worker32 [Stat Helper (refresh chunk files)] >> Problems occurred during file attribs refresh. entryID: 17-639FDADD-C9
(2) Apr12 18:14:19 XNodeSync [BuddyCommTk.cpp:206] >> Resync job currently running. Buddy node ID: 202
(2) Apr12 18:14:20 Worker29 [Close Helper (close chunk files S)] >> Communication with storage target failed: 2004; FileHandle: 60ECBDD2#4EB-61612DA7-C9; Error: Communication error
(2) Apr12 18:14:20 Worker29 [Close Helper (close chunk files S)] >> Problems occurred during close of chunk files. FileHandle: 60ECBDD2#4EB-61612DA7-C9
(2) Apr12 18:14:23 CommSlave23 [Trunc chunk file work] >> Communication with storage target failed. TargetID: 2004; EntryID: 0-6436846F-C9
(2) Apr12 18:14:23 Worker27 [Trunc chunk file helper] >> Problems occurred during truncation of storage server chunk files. File: 0-6436846F-C9
.........more problem chunk files
(2) Apr12 18:46:17 CommSlave47 [Stat chunk file work] >> Communication with storage target failed. TargetID: 2004; EntryID: 4-643683BB-C9
(2) Apr12 18:46:17 Worker20 [Stat Helper (refresh chunk files)] >> Problems occurred during file attribs refresh. entryID: 4-643683BB-C9 ---------------management.log show node01 storage connected at 18:44
(2) Apr12 18:46:22 XNodeSync [BuddyCommTk.cpp:206] >> Resync job currently running. Buddy node ID: 202
(2) Apr12 18:46:52 XNodeSync [BuddyCommTk.cpp:206] >> Resync job currently running. Buddy node ID: 202
.....until Apr 13 09:00
In my opinion, even if the meta data synchronization is required, the synchronized data is samll and should be completed quickly.
In fact, I disable the services of other nodes and then restart nodes, master01 restarted at the end and manually setting its status to good after the meta service started, the automatic synchronization was completed quickly.
Does this mean that the meta of master01 also failure but did not update its status before i reboot? but I didnot see any logs by "systemctl status beegfs-meta" or "more /var/log/beegfs-meta.log".
If the meata service crashes, then I should not have received many "Resync job currently running" logs at all night;if meta service is normal, then I should not have received "Receive timeout from beegfs-meta master01" from other clients.
Do you have any more ideas? Thanks.