VMI starts very slowly

38 views
Skip to first unread message

Meteor Cai

unread,
Apr 23, 2025, 5:04:38 AM4/23/25
to kubevirt-dev
In my Kubervirt environment, when creating a VMI, I used nodeSelector to bind several Intel hosts. If the hostDevices were not specified, the VMI could start normally within one minute and ping the IP address.   
If hostDevices are specified, VMI starts very slowly, taking at least 10 minutes or even longer to start successfully and ping the IP address. 
Log in to VMI via SSH, check the OS startup log, and determine from the timestamp of the log that the OS did not get stuck during the startup process, and the OS startup was completed within one minute.
It seems that a long time has been spent on mounting the device to the VMI. 
All the other AMD worker nodes are normal. 

In this situation, how should I troubleshoot?
What could be the problem somewhere?

Luboslav Pivarc

unread,
Apr 23, 2025, 10:01:53 AM4/23/25
to Meteor Cai, kubevirt-dev
Hi,

The first step would be to look at the VMI status where we have phaseTransitionTimestamps. This will tell you in which phase you spend most of the time and it will narrow the area where to look.

--Lubo

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/29c33661-5b58-44f3-96b8-06484c1ca8d1n%40googlegroups.com.

Meteor Cai

unread,
Apr 23, 2025, 9:22:31 PM4/23/25
to kubevirt-dev
root@master:~# kubectl get pod
NAME                                                            READY   STATUS    RESTARTS   AGE
virt-launcher-kubevirt-moduledaily-1l300-02-221308957-2-mht52   3/3     Running   0          7m50s
root@master:~# kubectl get vmi
NAME                                        AGE     PHASE     IP               NODENAME         READY
kubevirt-moduledaily-1l300-02-221308957-2   7m55s   Running   172.22.248.226   workernode146   True
root@master:~# ping 172.22.248.226
PING 172.22.248.226 (172.22.248.226) 56(84) bytes of data.
From 172.22.248.192 icmp_seq=1 Destination Host Unreachable
From 172.22.248.192 icmp_seq=2 Destination Host Unreachable
From 172.22.248.192 icmp_seq=3 Destination Host Unreachable
...
 
The status of pod and vmi has been constantly handling "Running" without any exception prompts.

Luboslav Pivarc

unread,
Apr 24, 2025, 3:57:34 AM4/24/25
to Meteor Cai, kubevirt-dev
Hi

On Thu, Apr 24, 2025 at 3:22 AM Meteor Cai <meteor...@gmail.com> wrote:
root@master:~# kubectl get pod
NAME                                                            READY   STATUS    RESTARTS   AGE
virt-launcher-kubevirt-moduledaily-1l300-02-221308957-2-mht52   3/3     Running   0          7m50s
root@master:~# kubectl get vmi
NAME                                        AGE     PHASE     IP               NODENAME         READY
kubevirt-moduledaily-1l300-02-221308957-2   7m55s   Running   172.22.248.226   workernode146   True
root@master:~# ping 172.22.248.226
PING 172.22.248.226 (172.22.248.226) 56(84) bytes of data.
From 172.22.248.192 icmp_seq=1 Destination Host Unreachable
From 172.22.248.192 icmp_seq=2 Destination Host Unreachable
From 172.22.248.192 icmp_seq=3 Destination Host Unreachable
...
 
The status of pod and vmi has been constantly handling "Running" without any exception prompts.

What I meant was kubectl get vmi <name-of-vmi> -o=jsonpath='{.status.phaseTransitionTimestamps}' , this will show where the slowdown happens.

-Lubo
 

在2025年4月23日星期三 UTC+8 22:01:53<Luboslav Pivarc> 写道:
On Wed, Apr 23, 2025 at 11:04 AM Meteor Cai <meteor...@gmail.com> wrote:
In my Kubervirt environment, when creating a VMI, I used nodeSelector to bind several Intel hosts. If the hostDevices were not specified, the VMI could start normally within one minute and ping the IP address.   
If hostDevices are specified, VMI starts very slowly, taking at least 10 minutes or even longer to start successfully and ping the IP address. 
Log in to VMI via SSH, check the OS startup log, and determine from the timestamp of the log that the OS did not get stuck during the startup process, and the OS startup was completed within one minute.
It seems that a long time has been spent on mounting the device to the VMI. 
All the other AMD worker nodes are normal. 

In this situation, how should I troubleshoot?
What could be the problem somewhere?


Hi,

The first step would be to look at the VMI status where we have phaseTransitionTimestamps. This will tell you in which phase you spend most of the time and it will narrow the area where to look.

--Lubo

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/29c33661-5b58-44f3-96b8-06484c1ca8d1n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Meteor Cai

unread,
Apr 24, 2025, 4:40:46 AM4/24/25
to kubevirt-dev
root@master:~# kubectl get vmi
NAME                                        AGE   PHASE     IP               NODENAME         READY
kubevirt-moduledaily-1l300-02-221308957-2   12m   Running   172.22.185.136   wokernode1   True
root@master:~# ping 172.22.185.136
PING 172.22.185.136 (172.22.185.136) 56(84) bytes of data.
From 172.22.185.131 icmp_seq=1 Destination Host Unreachable
From 172.22.185.131 icmp_seq=2 Destination Host Unreachable
From 172.22.185.131 icmp_seq=3 Destination Host Unreachable
^C
--- 172.22.185.136 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3065ms
pipe 4
root@master:~# kubectl get vmi kubevirt-moduledaily-1l300-02-221308957-2 -o=jsonpath='{.status.phaseTransitionTimestamps}'
[{"phase":"Pending","phaseTransitionTimestamp":"2025-04-24T08:17:14Z"},{"phase":"Scheduling","phaseTransitionTimestamp":"2025-04-24T08:17:14Z"},{"phase":"Scheduled","phaseTransitionTimestamp":"2025-04-24T08:17:21Z"},{"phase":"Running","phaseTransitionTimestamp":"2025-04-24T08:17:43Z"}]


root@master:~# kubectl get vmi
NAME                                        AGE   PHASE     IP               NODENAME         READY
kubevirt-moduledaily-1l300-02-221308957-2   13m   Running   172.22.185.136   wokernode1   True
root@master:~# ping 172.22.185.136
PING 172.22.185.136 (172.22.185.136) 56(84) bytes of data.
64 bytes from 172.22.185.136: icmp_seq=1 ttl=63 time=1.41 ms
64 bytes from 172.22.185.136: icmp_seq=2 ttl=63 time=1.29 ms
^C
--- 172.22.185.136 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.290/1.350/1.411/0.060 ms
root@master:~# kubectl get vmi kubevirt-moduledaily-1l300-02-221308957-2 -o=jsonpath='{.status.phaseTransitionTimestamps}'
[{"phase":"Pending","phaseTransitionTimestamp":"2025-04-24T08:17:14Z"},{"phase":"Scheduling","phaseTransitionTimestamp":"2025-04-24T08:17:14Z"},{"phase":"Scheduled","phaseTransitionTimestamp":"2025-04-24T08:17:21Z"},{"phase":"Running","phaseTransitionTimestamp":"2025-04-24T08:17:43Z"}]
root@master:~# 


The return value of a phaseTransitionTimestamps found no abnormalities.

Luboslav Pivarc

unread,
Apr 24, 2025, 5:40:01 AM4/24/25
to Meteor Cai, kubevirt-dev
So this shows that KubeVirt got the VMI into Running within 29 seconds. Reasonable. It means that the devices were already there and the Guest should be booting. I am not sure where those 10 minutes could be spent. Does the Guest logs correlate with the 2025-04-24T08:17:43 ?

-Lubo

 

Meteor Cai

unread,
Apr 25, 2025, 4:04:30 AM4/25/25
to kubevirt-dev
Let me synchronize the information with you. 
Today, after I gradually upgraded the version of Kubevirt from 1.2.0 to 1.5.0, the fault phenomenon disappeared. 
Thank you again for your support.

Reply all
Reply to author
Forward
0 new messages