XNodeSync [Auto-offline] >> No state report received from storage target for 228 seconds. Setting storage target to offline

368 views
Skip to first unread message

josh.d...@convergecfd.com

unread,
Dec 27, 2016, 3:16:36 PM12/27/16
to beegfs-user
Hi guys,

I updated our small bgfs cluster to v6 today and so wish I had not. Nothing but problems and I can't seem to figure it out.

We have 1 meta data server that also doubles as management
We have 2 storage servers

We first brought all services down, and then updated each machine one by one.
We then brought the metadata/mgmtd server back up, waited a bit, then brought the storage servers back up.
Now none of the nodes will stay connected. I keep getting this message

XNodeSync [Auto-offline] >> No state report received from storage target for 228 seconds. Setting storage target to offline.

and a similar one for the metadata node. Then after maybe another minute or two, they come back online. Then they go offline, then they come back up. You get the idea.

Any help is super appreciated

josh.d...@convergecfd.com

unread,
Dec 27, 2016, 3:19:04 PM12/27/16
to beegfs-user
Here is some more sample output from the log of the mgmtd

(2) Dec27 14:13:22 Worker3 [Node registration] >> New node: beegfs-client 7D6-5862CB62-meta01-bgfs [ID: 1]; Ver: 6.2-0; Source: 172.16.200.101:44696
(2) Dec27 14:13:25 Worker1 [Change consistency states] >> Metadata node is coming online. ID: 1
(2) Dec27 14:13:27 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 1; Pool: Normal; Reason: Free capacity threshold
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 4; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 5; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 6; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 7; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 8; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:27 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 9; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:13:32 Worker1 [Change consistency states] >> Storage target is coming online. ID: 4
(2) Dec27 14:13:32 Worker1 [Change consistency states] >> Storage target is coming online. ID: 5
(2) Dec27 14:13:32 Worker1 [Change consistency states] >> Storage target is coming online. ID: 6
(2) Dec27 14:13:33 Worker2 [Change consistency states] >> Storage target is coming online. ID: 7
(2) Dec27 14:13:33 Worker2 [Change consistency states] >> Storage target is coming online. ID: 8
(2) Dec27 14:13:33 Worker2 [Change consistency states] >> Storage target is coming online. ID: 9
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 4; Pool: Normal.
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 5; Pool: Normal.
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 6; Pool: Normal.
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 7; Pool: Normal.
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 8; Pool: Normal.
(2) Dec27 14:13:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 9; Pool: Normal.
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 178 seconds. Setting storage target to probably-offline. Storage target ID: 4
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 178 seconds. Setting storage target to probably-offline. Storage target ID: 5
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 178 seconds. Setting storage target to probably-offline. Storage target ID: 6
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 180 seconds. Setting storage target to offline. Storage target ID: 7
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 180 seconds. Setting storage target to offline. Storage target ID: 8
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from storage target for 180 seconds. Setting storage target to offline. Storage target ID: 9
(2) Dec27 14:16:37 XNodeSync [Auto-offline] >> No state report received from metadata node for 179 seconds. Setting metadata node to probably-offline. Metadata node ID: 1
(2) Dec27 14:16:37 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 1; Pool: Emergency; Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 4; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 5; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 6; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 7; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 8; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:16:37 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 9; Pool: Emergency;  Reason: No capacity report received.
(2) Dec27 14:17:37 XNodeSync [Auto-offline] >> No state report received from storage target for 238 seconds. Setting storage target to offline. Storage target ID: 4
(2) Dec27 14:17:37 XNodeSync [Auto-offline] >> No state report received from storage target for 238 seconds. Setting storage target to offline. Storage target ID: 5
(2) Dec27 14:17:37 XNodeSync [Auto-offline] >> No state report received from storage target for 238 seconds. Setting storage target to offline. Storage target ID: 6
(2) Dec27 14:17:37 XNodeSync [Auto-offline] >> No state report received from metadata node for 239 seconds. Setting metadata node to offline. Metadata node ID: 1
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 7
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 8
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 9
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 4
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 5
(2) Dec27 14:18:22 Worker4 [Change consistency states] >> Storage target is coming online. ID: 6

josh.d...@convergecfd.com

unread,
Dec 27, 2016, 3:34:54 PM12/27/16
to beegfs-user
Another note if it wasn't already obvious, we did not have this issue with 2015.

Also, for some reason now using any beegfs-ctl command takes a minute and a half to execute.

Sven Breuner

unread,
Jan 10, 2017, 7:21:33 AM1/10/17
to fhgfs...@googlegroups.com
Hi,

just in case someone else encounters the same symptoms: This was caused by old
clients still running and trying to communicate with the new servers, which
intentionally is not possible due to significant changes in the network
protocol. So stopping the old clients resolved the problem.

The upgrade guide has been updated to make it more clear that old clients need
to be stopped before upgrading the servers from to the new v6 major release:
http://www.beegfs.com/wiki/Upgrade2015To6

Best regards,
Sven
Reply all
Reply to author
Forward
0 new messages