Weird cluster behavior

MPECS Inc.

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

--
You received this message because you are subscribed to the Google Groups "ntsysadmin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ntsysadmin+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/CAHBr%2B%2Bj2OM3HBtQ12Yx%3DSJP8SPOGCxNbLR2c3bR9SbvSaX0OwQ%40mail.gmail.com.

Lieckfeldt.Sven

unread,

Jul 24, 2025, 9:46:28 AM7/24/25

to ntsys...@googlegroups.com

Hi Mike,

maybe you run into a bug which seems to in Failover Cluster Service in Windows Server 2025. According to MS it’s getting a fix in the next couple of months, however it’s not officially documented, with the exception of the comments at the Exchange Team Blog: Released: May 2025 Exchange Server Hotfix Updates | Microsoft Community Hub

Here is my example: Exchange DAG Cluster is using the Failover Cluster Service from the OS. When the cluster group is not moved to the other node before restarting, all DBs might blow up, if the rebooted server holds the cluster group. When the cluster group is moved before reboot, everything is fine.

This leads basically to a DOA feature, because an outage doesn’t ask kindly to move this role before taking the node away 😃

Cheers,

Sven

Von: ntsys...@googlegroups.com <ntsys...@googlegroups.com> Im Auftrag von Mike Leone
Gesendet: Mittwoch, 23. Juli 2025 21:04
An: NTSysAdmin <ntsys...@googlegroups.com>
Betreff: [ntsysadmin] Weird cluster behavior

Achtung! Externe E-Mail. Bitte mit Links und Anhängen aufpassen!

--

Philip Elder

unread,

Jul 24, 2025, 3:47:59 PM7/24/25

to ntsys...@googlegroups.com

I’ve reached out to the team. I’ll let all y’all know once I hear back and share what I can.

Philip Elder MCTS

Senior Technical Architect

MPECS Inc.

https://support.microsoft.com/en-us/topic/july-8-2025-kb5062557-os-build-17763-7558-9a2cd65b-c7a7-4331-87c4-84790511f6fe

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/BE1P281MB3271BFE6B870E6C3BDC2EB9ADD5EA%40BE1P281MB3271.DEUP281.PROD.OUTLOOK.COM.

Philip Elder

unread,

Jul 24, 2025, 4:09:55 PM7/24/25

to ntsys...@googlegroups.com

Are the CSVs BitLocker encrypted?

Philip Elder MCTS

Senior Technical Architect

MPECS Inc.

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

From: 'Lieckfeldt.Sven' via ntsysadmin <ntsys...@googlegroups.com>
Sent: Thursday, July 24, 2025 07:46
To: ntsys...@googlegroups.com
Subject: AW: [ntsysadmin] Weird cluster behavior

Hi Mike,

To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/BE1P281MB3271BFE6B870E6C3BDC2EB9ADD5EA%40BE1P281MB3271.DEUP281.PROD.OUTLOOK.COM.

Mike Leone

unread,

Jul 24, 2025, 5:17:22 PM7/24/25

to NTSysAdmin

No, no BitLocker on these drives.

To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/ab06607e0e2047d8951b72c6a9dd38e0%40MPECSInc.Ca.

Philip Elder

unread,

Jul 24, 2025, 5:34:44 PM7/24/25

to ntsys...@googlegroups.com

I didn’t think so.

I’m waiting on feedback. Will follow-up one way or the other.

Philip Elder MCTS

Senior Technical Architect

MPECS Inc.

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/CAHBr%2B%2BiYnqwcPMEZuTiKLNOQmO_-__W9Uft%3DMk9g4ZvJ9RzmHA%40mail.gmail.com.

Philip Elder

unread,

Jul 24, 2025, 7:08:19 PM7/24/25

to ntsys...@googlegroups.com

Mike,

The reply back form the team:

[QUOTE]

If there isn’t much running on this cluster, I would suggest trying to get the storage validation running. I think that can be accomplished by putting the disks into maintenance mode, if my memory serves me correctly – then storage validation should run on the disks, and PR issues should be surfaced.

I suspect that the issue could be LUN masking – maybe only one of the nodes can see the LUNs?

[/QUOTE]

Please post the results if you can.

Thanks,

Philip Elder MCTS

Senior Technical Architect

MPECS Inc.

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

From: ntsys...@googlegroups.com <ntsys...@googlegroups.com> On Behalf Of Mike Leone

Sent: Thursday, July 24, 2025 15:17
To: NTSysAdmin <ntsys...@googlegroups.com>
Subject: Re: [ntsysadmin] Weird cluster behavior

No, no BitLocker on these drives.

To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/CAHBr%2B%2BiYnqwcPMEZuTiKLNOQmO_-__W9Uft%3DMk9g4ZvJ9RzmHA%40mail.gmail.com.

Mike Leone

unread,

Jul 25, 2025, 10:56:41 AM7/25/25

to ntsys...@googlegroups.com

On Thu, Jul 24, 2025 at 7:08 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

Mike,

The reply back form the team:

[QUOTE]

If there isn’t much running on this cluster, I would suggest trying to get the storage validation running. I think that can be accomplished by putting the disks into maintenance mode, if my memory serves me correctly – then storage validation should run on the disks, and PR issues should be surfaced.

I suspect that the issue could be LUN masking – maybe only one of the nodes can see the LUNs?

[/QUOTE]

We use disk quorum, and you can't turn on maintenance mode for that disk. But I did put the others into maintenance mode, and ran the validation again.,

* The disks are already clustered and currently Online in the cluster. When testing a working cluster, ensure that the disks that you want to test are Offline in the cluster.

Maintenance mode leaves the disk Online, and none of the storage tests ran, because the disks were Online.

So I turned off maintenance mode, took the disks offline. That way, all 5 disks went offline.

I then ran the validation again.

It seems to think my quorum disks (which is only 1G in size) has no free space ...

Bringing the disks online show the actual use and capacity

All other disk tests passed ...

I ran these tests from host #27. So I went to host #28 (owner of the role and all disks, as shown above), and just rebooted it. TO see if it would gracefully failover to host #27. (I have another cluster using this same configuration, cluster storage via iSCSI from the same Nutanix cluster as above, and it works perfectly).

This is while host #28 is rebooting:

And after host #28 comes back up, everything went back to it.

So same problem as before. I see cluster errors for each disk ...

I had disconnected all drives in iSCSI. I even deleted the Discovery Portal, and re-entered it. I had even deleted all the disks in the Nutanix Volume Group, and created new ones, which were then presented.

I am at a loss ..... The way Nutanix works, storage is (or can be) presented as iSCSI.

That is the same discovery target I am using on the cluster that works ...

Access to this Volume Group is presented to the 2 IPs of the nodes:

Z:\>nslookup 10.64.126.224
Server: DC1WRK014.wrk.ads.pha.phila.gov
Address: 10.64.7.95

Name: DC1DBS027.wrk.ads.pha.phila.gov
Address: 10.64.126.224

Z:\>nslookup 10.64.126.225
Server: DC1WRK014.wrk.ads.pha.phila.gov
Address: 10.64.7.95

Name: DC1DBS028.wrk.ads.pha.phila.gov
Address: 10.64.126.225

(kinda obviously, otherwise I wouldn't be seeing the disks in iSCSI on both hosts. All disks are CONNECTED in iSCSI, but only brought online in Disk Manager on host #28 - yes, I tried having them online in disk manager on both nodes, same failed results ...)

I am stumped, at this point. Especially since I have an earlier cluster with these same settings that is working just fine ...

Mike Leone

unread,

Jul 25, 2025, 11:30:36 AM7/25/25

to ntsys...@googlegroups.com

SO ...

if Quorum disk is on Host #27, and all other disks are on host #28, and I reboot host #27 ... all is fine. Disks move to host #28, role (User Manager) goes over to host #28.

If Quorum disk is on host #28, and all other disks are on host #27, and I reboot host #28 ... the disks stay on host #27. The QUORUM does NOT go to host #27, it comes back to host #28 (as does the role) ...

Role User Manager has no preferred owner. So I checked off both nodes as preferred owners (hey, I'm clutching at straws here ...).

Failback is set to "Prevent Failback", so I left it at that.

I checked all disks, "Possible Owners" are both nodes".

So I reboot host #28 ... The quorum disk and role went to host #27 ... and stayed there, as it should!

So now, with role, quorum, and all disks on host #27, I decide to try rebooting host #27 ...

Role, quorum, and all disks go over to host #28, exactly as they should.

In other words ... all working?!?!

I tried again.

Role, quorum, disks all on host #28. Reboot host #28. All moved back to host #27, exactly as it should ...

Role, quorum, disks all on host #27. Reboot host #27. problem is back, quorum went to host #28, role stayed on host #27, as did the disks. They TRIED to come online on host #28, and 2 did. But as soon as host #28 came back, all the disks went back to host #27 ...

Don't ask me, I just work here ....

At this point, I may just destroy everything. Cluster, delete the nodes, delete the Volume Group, start all over. (which I think I've already tried ....)

It's practically Friday lunchtime, maybe I can just let it sit until Monday ... I have other tasks to occupy my afternoon ...

Philip Elder

unread,

Jul 25, 2025, 11:51:25 AM7/25/25

to ntsys...@googlegroups.com

Mike,

Does the Cluster Name Object IP require access to the Nutanix iSCSI Target?

Is that how the other cluster is set up? Three IPs accessing one for each node and one for the CNO?

Philip Elder MCTS

Senior Technical Architect

MPECS Inc.