Hello all,
With FF looming I'd appreciate some help understanding why the
guestlog tests are so unstable within the following PR introducing
live update support for instance types:
instancetype: Support Live Updates
https://github.com/kubevirt/kubevirt/pull/11455
An example failure can be seen below:
https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirt/11455/pull-kubevirt-e2e-k8s-1.30-sig-compute/1798767730631905280
I've been working with Felix for the last few days to identify a
reproducer to understand the issue but this only appears to reproduce
in full CI runs with the full series present. An attempt to bisect the
series in CI didn't get us anywhere and I feel like I'm back to square
one again.
My working assumption has been that the issue is being caused by the
series moving hot plug defaulting to the VMI mutation webhook as these
tests only use VMIs. I had assumed this was causing memory pressure
within the VMIs as we allocate the calculated max guest value and that
this is somehow causing the instability in virtlogd etc but I've not
been able to confirm that yet.
I'd appreciate any and all feedback here as I'm very very confused by
this behaviour.
Regards,
Lee