Hi all,
Over the weekend, we needed to manually merge (in Powershell) some hyper-v differencing (AVHDX) files that aren’t showing up as checkpoints in the Hyper-V console. We’ve been testing after each merge, and selecting the newly merged AVHDX file for the VM gives us a message that it’s unable to select that VHD because a merge is pending. There’s about 17 more differencing files to go before we get back to the original VHD.
We’ve tried rebooting the host several times, and it does not seem to change the situation. It’s Monday morning now, and our only plan is to continue the manual merge until we are down to a single VHD (as it should be) – but this is the primary file server, and our ETA on the merge process is a couple more days.
Any suggestions on getting past the “merge pending” message??
--P
-----
If you never know what your infrastructure is, you can never know if it’s been breached.
Poppy Lochridge (she/her)
NetCorps
1385-B Oak Street
Eugene, OR 97401
I think you are resolving it properly, but what are you trying to do that you are being stopped from?
The key thing is, I think, to determine what was causing the differencing files to be created, so that this isn’t an issue moving forward. Is it due to a backup failure?
--
You received this message because you are subscribed to the Google Groups "ntsysadmin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
ntsysadmin+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/ntsysadmin/CO1PR08MB706173617003BEED84236F88CAB72%40CO1PR08MB7061.namprd08.prod.outlook.com.
Did you just look in the Hyper-V Manager GUI for those checkpoints? Or did you also verify with Get-VMSnapshot via PoSH?
(Yes, MS still hasn’t gotten their naming convention standardized for this.)
I’ve had some that showed in PowerShell and not the GUI.
Is this part of a cluster where you could live migrate that VM to another host and see if that allows the merges to start automatically?
|
From: ntsys...@googlegroups.com <ntsys...@googlegroups.com>
On Behalf Of Michael B. Smith
Sent: Monday, July 29, 2024 10:02 AM
To: ntsys...@googlegroups.com
Subject: [ntsysadmin] RE: Hyper-V manual merge - problems booting
CAUTION: This message was sent from outside of Canal Insurance. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe. Please report all suspicious emails to "inf...@canal-ins.com" as an attachment.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/edb8759fe8644a598c7866a7ca0b267d%40smithcons.com.
It was a backup failure that caused the problem, and we identified that in 2022 and resolved the backup failure then – but when we started merging differencing files, the time investment was so large that we did a few and decided to defer. We deferred a little too long, and ended up in a drive space crunch this month. Yeah, a mistake, and I won’t go into the reasons why the mistake happened.
What we’d like to do is to be able to boot from the current ADHVX file that’s been merged, boot the guest VM, and be able to work during the week, then pick up the manual merge process in scheduled downtime. Right now, we’re unable to start the guest VM because the linked VHD no longer exists and we’re unable to link to a different VHD because it says a merge is pending.
--P
From: ntsys...@googlegroups.com <ntsys...@googlegroups.com>
On Behalf Of Michael B. Smith
Sent: Monday, July 29, 2024 7:02 AM
To: ntsys...@googlegroups.com
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/edb8759fe8644a598c7866a7ca0b267d%40smithcons.com.
The root vhd/vhdx no longer exists? That’s bad news. I don’t think you have any option but to continue the merge process. And you may find, at the end, that a restore is required anyway.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CO1PR08MB706141C71648B4BA20026AA3CAB72%40CO1PR08MB7061.namprd08.prod.outlook.com.
We looked in the Hyper-V Manager GUI for the checkpoints and they do not/did not exist there.
We ran this process, which describes the original problem exactly: https://learn.microsoft.com/en-us/troubleshoot/windows-server/virtualization/merge-checkpoints-with-many-differencing-disks
Using Powershell and Get-VM, which DID produce a series of differencing files and commands for the manual merge.
We don’t have a cluster, sadly, but we do have a secondary server – I can check into the possibility of setting that up as a host and trying a migration to it.
--P
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/MW5PR13MB590109AA7DEF258BD706E0C5C5B72%40MW5PR13MB5901.namprd13.prod.outlook.com.
Thankfully, the root VHDX does exist. We’re about 16 merges away from it, but it does exist.
“the linked VHD no longer exists” – the AVHDX file that Hyper-V says is linked to the VM has already been merged, as have the next ~6 files in the chain.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/09bd90de3bb54ac2bd010e6ff9d0c339%40smithcons.com.
I have the same problem with a guest cluster and vhds files….
Have you tried to live migrate the VM ? it should release the ghost lock on the file and proceed the merge
De : 'Poppy Lochridge' via ntsysadmin <ntsys...@googlegroups.com>
Envoyé : lundi 29 juillet 2024 19:30
À : ntsys...@googlegroups.com
Objet : [ntsysadmin] RE: Hyper-V manual merge - problems booting
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CO1PR08MB70619AC2005E7A28438D0A2BCAB72%40CO1PR08MB7061.namprd08.prod.outlook.com.
Did you get this resolved?
One can trigger a merge for orphaned .AVHDX files by creating a snapshot/checkpoint in Hyper-V Management for the impacted VM.
That doesn’t always work though. PowerShell is the way to go if it does not clear them all up.
Process Explorer will give you the ability to find out where the locks are coming from. My guess is the backup software.
This situation is common. In my experience it indicates a weak storage subsystem as we see it where IOPS are lower.
What happens:
Step 8 is where things get messed up. The clean-up process remains when the steps further down are called due to lagging I/O.
As a result, we get what’s called an “ghost process” that was orphaned as a result of the OS going, “Okay, VSS is released delete that checkpoint/snapshot and yup, we’re all good now!” Fortunately, that .AVHDX file remains otherwise we’d be hooped.
As above, trigger a Checkpoint/Snapshot in Hyper-V Management to see if they get merged. In most cases they will.
Eric has a good article here:
https://www.altaro.com/hyper-v/clean-up-hyper-v-checkpoint/
This Veeam Forum’s post has good info but it must be gathered up:
https://forums.veeam.com/microsoft-hyper-v-f25/9-5-hyper-v-known-issues-t38927-270.html
Philip Elder MCTS
Senior Technical Architect
Microsoft High Availability MVP
E-mail: Phili...@mpecsinc.ca
Phone: +1 (780) 458-2028
Web: www.mpecsinc.com
Blog: blog.mpecsinc.com
Twitter: Twitter.com/MPECSInc
Skype: MPECSInc.
Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.
From: 'Poppy Lochridge' via ntsysadmin <ntsys...@googlegroups.com>
Sent: Monday, July 29, 2024 07:59
To: NTSysAdmin <ntsys...@googlegroups.com>
Subject: [ntsysadmin] Hyper-V manual merge - problems booting
Hi all,
--
We did! Ended up detaching the hard drive from the affected VM entirely, then attaching a new SCSI virtual drive with the currently merged AVHDX. It booted, and folks were able to access their data.
We still have about 9 AVHDX files to merge, but this gives us breathing room to schedule the downtime to do those.
To be determined: who and when the remaining merges will be done AND how to replace the backup system that caused the initial problem.
--P
From: ntsys...@googlegroups.com <ntsys...@googlegroups.com>
On Behalf Of Philip Elder
Sent: Thursday, August 1, 2024 1:59 PM
To: ntsys...@googlegroups.com
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/92581103b91b46649316e4572385c280%40MPECSInc.Ca.
Is there enough free space on the host somewhere to set up a new .VHDX file, attach it to the VM, use BeyondCompare (Run As Admin, set NTFS Permission Copy, Date Creation Copy) to create a copy, and finally just drop the bad one?
Note that this will have an impact on backups depending on the backup in use. So, keep that in mind.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CO1PR08MB7061D7067E497AE224A8037ACAB32%40CO1PR08MB7061.namprd08.prod.outlook.com.
Oops … hit SEND too quick.
If that does work, then you can schedule some down time, drop the bad OS .VHDX, create and attach a new one, and restore that OS partition. It should be relatively quick if it is just the OS and some app/server files and folders.
Philip Elder MCTS
Senior Technical Architect
Microsoft High Availability MVP
E-mail: Phili...@mpecsinc.ca
Phone: +1 (780) 458-2028
Web: www.mpecsinc.com
Blog: blog.mpecsinc.com
Twitter: Twitter.com/MPECSInc
Skype: MPECSInc.
Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/0bf76ca841ff4bfe923f362079069d3c%40MPECSInc.Ca.
Yes, probably. I think part of our correction of error process might end up being a discussion of whether virtualizing the file server was the right plan in the first place, and if we should migrate the bulk of the stored files to a physical server instead. So it might be a moot point by the time we finish mapping out what we want to do for prevention.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/0bf76ca841ff4bfe923f362079069d3c%40MPECSInc.Ca.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/PH0PR08MB706682544E36498D93BBAF97CAB32%40PH0PR08MB7066.namprd08.prod.outlook.com.
There are simple scripts out there to send an e-mail if there’s orphaned .AVHDX files on the host(s)/nodes.
Putting a file server on iron is not a good use of resources IMO.
One other point: Are the NTFS partitions _within_ the guest at least 25% free?
Also, are there any collisions between VSS for Volume Shadow Copy (Previous Versions) and the backup product’s schedules? Make sure they are at least 5-10 minutes apart so as to avoid simultaneous VSS calls from in-guest and the backup software. This can corrupt the data within the guest.
And, one more: Is there enough VSC space allocated? If the file server has a largish repository of 2TB or more and there’s a fair amount of churn then the VSS VSC cache being full can cause issues with the snapshot process.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/PH0PR08MB706682544E36498D93BBAF97CAB32%40PH0PR08MB7066.namprd08.prod.outlook.com.
Question about that setup or two:
1: Are the LUNs evenly divided between the two controllers?
2: Are snapshots enabled and if they are do the destination snapshots get tested?
Just curious more than anything as our primary cluster type is HCI (S2D/AzSHCI) and some standalone Hyper-V + Storage Spaces setups.
Philip Elder MCTS
Senior Technical Architect
Microsoft High Availability MVP
E-mail: Phili...@mpecsinc.ca
Phone: +1 (780) 458-2028
Web: www.mpecsinc.com
Blog: blog.mpecsinc.com
Twitter: Twitter.com/MPECSInc
Skype: MPECSInc.
Please note: Although we may sometimes respond to email, text and phone calls instantly at all hours of the day, our regular business hours are 8:00 AM - 5:00 PM, Monday thru Friday.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CADy1Ce5Jgg2YORU1eRLxDTaNC8wpLx5fP8QCQRYamxbBmhvS2A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/693a31f669e449c7a767063c547bb693%40MPECSInc.Ca.
Something sure sounds off there.
DHCP?
Reservations should be set up for all static IP addresses or at the very least exclusions set for blocks assigned to devices/servers with static IP addresses. Ouch, that was a “should not have happened” situation plus a “we failed to document” one as well I think.
Planning is one thing, having good documentation and an understanding of how things are set up is yet another. :0(
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CADy1Ce6g3jtCDgvn4CHPeDr2UR0aiBMKEEywAoWsKNh%3DAb479A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/278db5168a054eddb9f8ea491a44582e%40MPECSInc.Ca.
We tried the DHCP high availability thing for a couple of infrastructure domains. It should just work right?
Nah, it didn’t. So, we stick with static IPs. It’s easier to manage with PowerShell too.
To view this discussion on the web visit https://groups.google.com/d/msgid/ntsysadmin/CADy1Ce5CH9g-%2BTa4RTE_RddVZFfGERSdbMbCappG%3D1ojWX5QMA%40mail.gmail.com.