Windows 2025 cluster issues - quorum shows lost, even when using file share witness

33 views
Skip to first unread message

Mike Leone

unread,
Aug 19, 2025, 11:07:47 AMAug 19
to NTSysAdmin
This is becoming a major problem, and an impediment to our upgrades.

We use Nutanix AHV as a hypervisor. I create a Win 2025 cluster, using iSCSI to access the shared storage. According to the Nutanix tech, this is their recommended method of shared storage.

For quorum, I am using a File Share Witness, The problem I have comes when restarting the nodes.

Specifically, when I reboot what should be a passive node (i.e., no roles on it, no disks assigned to it, nothing), the node itself reboots, but the cluster (now supposedly moving over to, and  running on, the other node) goes offline while the first node is rebooting, and doesn't come back online until the first node finishes rebooting.

Cluster Event log: 

The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

But Quorum was NOT lost. The other node was always online, and pointing at the File Share Witness. Everything just went offline, and stayed offline, until the other node came back.

Specifically: SQL Server Role and User Manager Role were both on node #29. All storage is assigned to SQL Server Role, there is no available storage, 
I rebooted the other node, node #30. That should be fine, right? It's the passive node, no resources are listed on it. But nope ... the whole cluster went offline, and stayed that way, until node #30 came back.

I am using FSW because I was having these same issues when using a disk quorum. I figured with a file share witness, I'd never lose quorum, as long as the file share witness was there. And it is, the witness never went offline.

The cluster itself has both nodes as possible owners, in advanced Policies. Properties of the role (SQL 2022, in my case) has no preferred owners , and Failover is set to "prevent failback".


Thoughts? Pointers?

--

Mike. Leone, <mailto:tur...@mike-leone.com>

PGP Fingerprint: 0AA8 DC47 CB63 AE3F C739 6BF9 9AB4 1EF6 5AA5 BCDF
Photo Gallery: <http://www.flickr.com/photos/mikeleonephotos>

Philip Elder

unread,
Aug 19, 2025, 1:08:51 PMAug 19
to ntsys...@googlegroups.com

Set up a 100GB shared LUN.

Connect that LUN to the nodes via iSCSI Target connection.

Set it to OFFLINE on one.

Format it NTFS on the other.

Call it Quorum.

Run the Witness Wizard and set that LUN as your quorum location.

 

Get rid of the network share. Something is not set up right there.

 

Philip Elder MCTS

Senior Technical Architect

Microsoft High Availability MVP

MPECS Inc.

E-mail: Phili...@mpecsinc.ca

Phone: +1 (780) 458-2028

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

 

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

--
You received this message because you are subscribed to the Google Groups "ntsysadmin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ntsysadmin+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ntsysadmin/CAHBr%2B%2BicWSvrj189S%3DFroEzSzK91ufiq1zT2SzmUp714pbseQQ%40mail.gmail.com.

Mike Leone

unread,
Aug 19, 2025, 1:13:05 PMAug 19
to ntsys...@googlegroups.com
On Tue, Aug 19, 2025 at 1:08 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

Set up a 100GB shared LUN.

Connect that LUN to the nodes via iSCSI Target connection.

Set it to OFFLINE on one.

Format it NTFS on the other.

Call it Quorum.

Run the Witness Wizard and set that LUN as your quorum location.

 

Get rid of the network share. Something is not set up right there.


That was the way I had it before, a disk quorum. Same errors.

Mike Leone

unread,
Aug 19, 2025, 1:29:35 PMAug 19
to ntsys...@googlegroups.com
Also, not the only cluster I am seeing this behavior on. So it's not just this one cluster ...

On Tue, Aug 19, 2025 at 1:08 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

Philip Elder

unread,
Aug 19, 2025, 1:41:40 PMAug 19
to ntsys...@googlegroups.com

Source 1: Bad VLAN port setting, tag path, or MTU setting

Source 2: Titanix (Nutanix) is the source of the problem

 

Can you put a file share witness on a system that is separate from Nutanix?

 

Philip Elder MCTS

Senior Technical Architect

Microsoft High Availability MVP

MPECS Inc.

E-mail: Phili...@mpecsinc.ca

Phone: +1 (780) 458-2028

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

 

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

 

--

You received this message because you are subscribed to the Google Groups "ntsysadmin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ntsysadmin+...@googlegroups.com.

Philip Elder

unread,
Aug 19, 2025, 1:42:10 PMAug 19
to ntsys...@googlegroups.com

The other cluster(s) being off the Nutanix setup?

Mike Leone

unread,
Aug 19, 2025, 2:25:17 PMAug 19
to ntsys...@googlegroups.com
On Tue, Aug 19, 2025 at 1:41 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

Source 1: Bad VLAN port setting, tag path, or MTU setting


Nope. File share witness is there, accessible from both nodes, shows as Online in Failover Cluster Manager.
Share itself has the cluster computer account with FULL control in the NTFS settings, and READ/WRITE in the share settings, as MS wants.


Source 2: Titanix (Nutanix) is the source of the problem

 

Can you put a file share witness on a system that is separate from Nutanix?


No, don't have anything.

At this point, I'm down to 2 options - try WIn 2022 instead of Wn 2025 (just a guess), or maybe ... 3 nodes and a file share, instead of 2 nodes and a file share (another guess).

I confused 2 different Nutanix engineers with this. LOL I don't have an MS account to open a case, they wanted MS involved ...

Mike Leone

unread,
Aug 19, 2025, 2:29:49 PMAug 19
to ntsys...@googlegroups.com
On Tue, Aug 19, 2025 at 1:42 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

The other cluster(s) being off the Nutanix setup?


I have like  23 clusters (yes, really) running on ESXi which runs on top of Nutanix. Not a problem with any of them.

Now we're moving to using Nutanix as the hypervisor, rather than VMware, and all these cluster issues come up ...

Philip Elder

unread,
Aug 19, 2025, 2:53:53 PMAug 19
to ntsys...@googlegroups.com

Ah.

 

Nutanix AHV is the problem then.

 

If things work as expected using VMware as your hypervisor _on_ the Nutanix HCI but when you start using AHV the problems arise then there’s the key right there.

 

You could stand up a virtual S2D (Storage Space Direct) 2-node cluster on Nutanix HCI and see if the problems follow. I suspect not though as VMware and Hyper-V are very similar in their performance, stability, and network integration.

 

AHV? I’m not so sure. But, the pointers are there to AHV.

 

Philip Elder MCTS

Senior Technical Architect

Microsoft High Availability MVP

MPECS Inc.

E-mail: Phili...@mpecsinc.ca

Phone: +1 (780) 458-2028

Web: www.mpecsinc.com

Blog: blog.mpecsinc.com

Twitter: Twitter.com/MPECSInc

Teams: Phili...@MPECSInc.Cloud

 

Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.

 

From: ntsys...@googlegroups.com <ntsys...@googlegroups.com> On Behalf Of Mike Leone
Sent: Tuesday, August 19, 2025 12:30
To: ntsys...@googlegroups.com
Subject: Re: [ntsysadmin] Windows 2025 cluster issues - quorum shows lost, even when using file share witness

 

On Tue, Aug 19, 2025 at 1:42PM Philip Elder <Phili...@mpecsinc.ca> wrote:

--

You received this message because you are subscribed to the Google Groups "ntsysadmin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ntsysadmin+...@googlegroups.com.

Mike Leone

unread,
Aug 21, 2025, 11:07:36 AMAug 21
to ntsys...@googlegroups.com
On Tue, Aug 19, 2025 at 2:53 PM Philip Elder <Phili...@mpecsinc.ca> wrote:

Ah.

 

Nutanix AHV is the problem then.


Maybe not ...

Here is an update:

I uninstalled my cluster, and deleted the 2 Win 2025 VMs. I rebuilt it using Win 2022. it accessed the same Volume Group, using the same iSCSI and same IPs. This time I used a disk quorum, which is how we usually build our clusters.

And it works perfectly.

If I have the disk quorum, and the role (SQL Server) and all the disks associated with the role, on node DBS029, and I reboot DBS029, everything smoothly moves over to DBS030, as it should, and never goes offline.
If I have the disk quorum, and the role (SQL Server) and all the disks associated with the role, on node DBS030, and I reboot DBS030, everything smoothly moves over to DBS029, as it should, and never goes offline.

I repeated this test 4 times, rebooting each node which held all roles, quorum, and disks. In all cases, everything transitioned smoothly to the other node, exactly as it should.

So at this point, all I can assume is that Win 2025 clustering does NOT work on the version of AHV that I am running, since a similar configuration using Win 2022 on that version of AHV does work as expected.

When using Win 2025, even using a File Share Witness, things would work with 1 node, but not the other. Meaning:  If I moved the role and it's disks to DBS029, and rebooted DBS030, the whole cluster would go DOWN., And stay offline, until DBS030 came back up. Then the cluster would come back. Cluster log showed quorum lost. Mind you, the quorum was a File Share Witness, which was ONLINE when I rebooted DBS030. So quorum should never have been lost, since DBS029 was always there, and the File Share Witness was always there, and showing ONLINE in Cluster Manager..

That happened whether I used a File Share Witness or a Disk quorum.

I dunno what the deal is, but I've spent over a week at this. If Win 2022 works, and Win 2025 does not, then I ain't using WIn 2025 (for cluster) on this AHV environment. Stand alone, non-clustered Win 2025 VMs show no issues, only clusters.
Doesn't matter whether the shared storage is iSCSI or SCSI DIRECT ATTACH, or if I use disk quorum or a File Share Witness. The cluster either goes down completely, or shows errors, when rebooting one of the nodes (even if the node has no resources running on it or not).

But Win 2022 works perfectly every time. I can reboot each node insequence, all resources move as they should, the cluster itself never goes down or offline.

Philip Elder

unread,
Aug 21, 2025, 3:50:04 PMAug 21
to ntsys...@googlegroups.com

Nutanix keeps their cards very close to their chest, one of the main reasons we will never partner with them, so trying to search out forum posts relative to issues on their platform is virtually unobtainium.

 

So, that does not exonerate AHV since we’re building clusters on Windows Server 2025, as are my fellow MVPs, and other than the zero CPU usage in Task Manager bug I’ve not seen much for anything else since GA.

 

I’d get the Nutanix support folks to set up an identical platform and cluster setup to see if they hit it. They should be performing root cause analysis on the problem since it’s on their platform. They’ve got front line to Microsoft too.

Reply all
Reply to author
Forward
0 new messages