Vmtools Service Timeout

0 views

Skip to first unread message

Jessia Adachi

unread,

Aug 4, 2024, 3:54:33 PM8/4/24

to ciabhakasflum

Beenhaving a problem with one of the guests. Guest is Server 2016 fully patched. Its having issues with tools timing out in the event log. I can also see it go from running to no running in the esxi interface. The error messages are:

Event ID 7011

A timeout (30000 milliseconds) was reached while waiting for a transaction response from the VMTools service.

Event ID 7046

The following service has repeatedly stopped responding to service control requests: VMware Tools

Contact the service vendor or the system administrator about whether to disable this service until the problem is identified.

You may have to restart the computer in safe mode before you can disable the service.

Steps taken. I updated the ESXI OS to the HPE image specified above. I updated VMWare tools from 10.2.1.4164 to 10.3.2-9925305 and then ultimately the latest 10.3.5-10430147 with no difference. The error first showed up 10/2/18. The Guest is a Veeam Backup Server that seems to run ok despite the error. The ESXI host has three AWS gateway appliances running on it with out any issues.

Dear Community, We are using Orchestrator workflow to copy files to a Windows 2008R2 virtual machine and for a few weeks there is a timeout error. In the Windows 'System' events we can see: 'A timeout (30000 milliseconds) was reached while waiting...

You mentioned that you tried to install latest version, still same issue. In regards to that, if there is a known bug or something, it is worth trying with the lower version of the tools as well, and check the behavior.

We have an issue on a few of our Windows 2016 servers that started sometime in the last few months and seems to be getting progressively worse. Resetting the system seems to right it for a few days, but inevitably it will slide into the same useless state and require another hard reset. So far the system has bounced back, with sometimes nothing more than a chkdsk, but on our SQL servers this can sometimes take a few minutes of recovery.

These systems run well for a few days, but then we notice that we can no longer connect via RDP. If we try to log in on the VM console, it will usually hang on the "Waiting for user profile service" but that never resolves and the console is stuck on that login until reset. The SQL or web service on the VM continue to run as if there is no problem for several hours, but eventually we will notice the IP address that vCenter shows for the server disappears and the box is now completely isolated. We have to hard reset to restore service.

I have ran SFC on all of these servers and there is no corruption reported. I ran the DISM tools and it does report the component store can be repaired, but looking in the DISM and CBS logs, there are no errors reported, only Info and Warning. We dont seem to have any problem installing Windows updates, we are patched up to the March roll-up. These servers cant reach MS Update servers, so not sure how to clear these DISM issues. I have injected from a KB CAB before, but if the logs dont identify a KB, then what?

This behavior where it works ok for a few days, then services start to die off sounds to me like a memory leak in some component, but Im sure there could be other things. We recently installed Elastic Metricbeat to see if we can spot the process that might be running amok.

So I am looking for some tips on things to watch that might cause RDP/User profile service to die, or a NIC to suddenly stop working. I assume that the VMware tools installed on this server are getting killed or choked out by this supposed runaway process.

We got nowhere with this. It just stopped happening. So $500 wasted on MS Technical Services. I am going to assume this was some kind of conflict between our antivirus suite and Microsoft Trusted Installer. That seems to be a common thing we saw in the log files when the crash occurred. I guess a Windows update or a McAfee update resolved the issue at some unknown time. I just hope it doesnt come back.

I have the same problem this started i think in late february begining of march

First the i would reset vm's and it woulkd last a few weeks , lately it's a few days with luck.

They all stop responding and if try to logon with console it hangs on profile

The only "error" i can see in the logs is this

svchost (1068) SoftwareUsageMetrics-Svc: Um pedido para escrever no ficheiro "C:\Windows\system32\LogFiles\Sum\Svc.log"

dont fully know if this is the actual problem or byproduct of the hang....

I moved the vm to another server and the problem is the same

i have malware bytes anti ransomware in the servers, i think im going to disable to see if it solves it

It seems the same for us. We found out the IP is not disappearing, its the just vmtools service being taken down that makes the IP disappear in vCenter. The VM still pings, its just all the services have stopped.

frequently i get the message in the eventviewer (A timeout (30000 milliseconds) was reached while waiting for a transaction response from the NetBackup Service Layer service). when i verfy, i found no backups turn on the Master (the backups run from a scheduler. there is really a while betewen jobs launched from TWS and the Master.

backup node file [passphrase ] Create a backup of an NSX KeyManager node.If you do not provide a passphrase on the command line, youwill be prompted to enter one. The passphrase is used to encryptthe backup. If you forget the passphrase, you will not be ableto restore the backup.Important: This backup command is one part of the backupprocess. You must complete all backup and restore tasks in thecorrect order. See the NSX-T AdministrationGuide for information and instructions about performing backupsand restores. Option Description Filename argument

Allowed pattern: ^[^/ *;&]+$ Backup passphrase Example nsx-keymanager-1> backup node file backup-node-timestamp.tar.gzPassphrase:nsx-keymanager-1> Mode Basic Availability Key Manager

clear auth-policy vidm lb-extern enabled Clear the external load balancer enabled property. Example nsx-manager-1> clear auth-policy vidm lb-extern enablednsx-manager-1> Mode Basic Availability Manager, Policy Manager

clear banner Clear the security banner or message of the day. The banner is reset to the system default banner. Example nsx> clear bannernsx> Mode Basic Availability Controller, Edge, Key Manager, Manager, Policy Manager, Public Cloud Gateway

clear bfd-session local-ip remote-ip stats Clear the statistics for the specified BFD session Option Description Network IP address argument Example nsx-edge-1> clear bfd-session local-ip 192.168.250.60 remote-ip 192.168.250.61 statsnsx-edge-1> Mode Basic Availability Edge, Public Cloud Gateway

clear dataplane flow-cache stats Clear flow cache statistics for all fastpath cores. Example nsx-edge-1> clear dataplane flow-cache statsnsx-edge-1> Mode Basic Availability Edge, Public Cloud Gateway

clear hardening-policy mandatory-access-control enabled This command disables mandatory access control on the node. Usage for the command is clear hardening-policy mandatory-access-control enabled Example nsx-edge-1> clear hardening-policy mandatory-access-control enabledMandatory Access Control is disabled. Mode Basic Availability Controller, Edge, Manager, Policy Manager, Public Cloud Gateway

clear high-availability channel local-ip remote-ip stats Clear statistics for the specified high-availability channel Option Description Network IP address argument Example nsx-edge-1> clear high-availability channel local-ip 30.0.246.232 remote-ip 30.0.29.0 stats Mode Basic Availability Edge, Public Cloud Gateway

clear high-availability channels stats Clear statistics for all high-availability channels. Example nsx-edge-1> clear high-availability channels stats Mode Basic Availability Edge, Public Cloud Gateway

clear high-availability history state Clear the high availability state history for the logical router in the VRF context. Example nsx-edge-1(tier1_sr)> clear high-availability history statensx-edge-1(tier1_sr)> Mode Tier0_sr, Tier1_sr Availability Edge, Public Cloud Gateway

clear high-availability session local-service-id peer-service-id stats Clear statistics for the specified high-availability session Option Description Service id (0-65535) Example nsx-edge-1> clear high-availability session local-service-id 101 peer-service-id 101 stats Mode Basic Availability Edge, Public Cloud Gateway

clear high-availability sessions stats Clear statistics for all high-availability sessions. Example nsx-edge-1> clear high-availability sessions stats Mode Basic Availability Edge, Public Cloud Gateway

clear interface Delete the specified VLAN network interface and all it's configuration, or the specified bond configuration, or both if a VLAN was configured over the bond. Users must configure an alternate interface for management. Option Description Configurable network interface argument Example nsx-edge> clear interface eth0.11Deleted interface eth0.11. The system does not have a managementIP address, you may configure one.nsx-edge> clear interface bond0Deleted interface bond0. The system does not have a managementIP address, you may configure one.nsx-edge> clear interface bond0.50Deleted interface bond0.50. The system does not have a managementIP address, you may configure one. Mode Basic Availability Edge, Public Cloud Gateway

clear interface ip Remove all network configuration from the specified interface. Option Description Configurable network interface argument Example nsx-edge> clear interface eth0 ipnsx-edge> Mode Basic Availability Edge, Public Cloud Gateway

clear interface plane Clear the network interface plane configuration. Option Description Configurable network interface argument Example nsx-edge> clear interface eth0 planensx-edge> Mode Basic Availability Edge, Public Cloud Gateway

clear lldp neighbors Deletes LLDP Neighbor information on given device. Option Description LLDP interface argument Example nsx-edge-1> clear lldp neighbors eth0 Mode Basic Availability Edge, Public Cloud Gateway