Hello everyone,
I would like to discuss possible approaches to address this.
The Issue:
Currently, CPU and memory hotplug are gated by the cluster-wide VMRolloutStrategy. When the strategy is not set to LiveUpdate, changes set a RestartRequired condition and are staged until a restart.
However, network interface hotplug/hotunplug bypasses this. The VMI spec is patched immediately regardless of the rollout strategy. This inconsistency exists because NIC hotplug predates the LiveUpdate strategy and was left ungated for backwards compatibility. Additionally, when the dynamic-networks-controller is not deployed, a MigrationRequired condition is added but no migration is triggered, leaving the process in limbo. Since KubeVirt 1.5, non-admin users cannot migrate a VM to complete this manually.
Proposed Options:
Option 1: Gate NIC hotplug/hotunplug behind VMRolloutStrategy
The VM controller would treat network changes like CPU/memory changes. If the strategy is not LiveUpdate, it will set the RestartRequired condition on the VM and not patch the VMI spec.
- Pros: Simplest path to consistency.
- Cons: Breaking change for users relying on in-place NIC hotplug with a non-LiveUpdate strategy, as gating at the VM controller level prevents the necessary VMI and pod patches.
Option 2: VMI reports restart requirement to the VM controller
Keep the existing VM-to-VMI sync so the VMI spec is always patched. The VMI would signal (via a condition or status field) that a restart is required when the strategy is not LiveUpdate. The VM controller would then set RestartRequired on the VM object.
- Pros: Preserves in-place hotplug functionality and maintains consistency with CPU/memory behavior without breaking workflows.
- Cons: Increases complexity in the VMI-to-VM feedback loop.
I would appreciate feedback or suggestions for other approaches we should consider.
Best regards,
Orel