Hey,We experienced something interesting today. If the majority of a cluster goes offline (lets say as a result of a power failure) and all the nodes come back online at the same time, every VM with runStrategy=Always gets started at pretty much the same time. For very large clusters, this massive start event causes a strain on the control plane components, network, storage, and just about everything else. Due to this strain, we start seeing time outs occur during VMI startup, which causes even more VMI's to get re-created to satisfy runStrategy=Always.This has me thinking about two things.1. I think we need to introduce a VM crashloop backoff for VM's with runStrategy=Always that continue to crash before ever having their VMI hit phase=Running.
2. I think we should consider a max queue len for how many VMIs we allow to be in the startup state before phase=Running.
Item 1 would prevent our control plane from thrashing during a VMI crash loop and item 2 would prevent the control plane from thrashing during a massive startup event.Has anyone else given these types of scenarios any thought? Do these seem like reasonable approaches?Thanks,- David
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAPjOJFsyD6eexGAoJ00giBiRSGcVzHikSe0Bn0avNo-K2CbhEQ%40mail.gmail.com.
--
HI David,I just read what the runStrategy is and I am quite confused on its usage. May I ask for some clarifications here?The API comment says runStrategy is mutually exclusive with the running flag.
1. If I set running to true, a vmi will be able to start and stop - is that corresponding to the manual strategy?
2. If I set running to true, and in your situation, will they all be brought up again after the nodes come back?
3. For the runStrategy RunStrategyRerunOnFailure, what failure are we talking about here? Are we continuously monitoring vmi failure, or just the exit code?
As for the issue you see, is VMI recreation handled by a virt-controller?
Would some rate limiting at the virt-controller processing queue help?
Thanks,Zang--On Wed, Jun 16, 2021 at 12:28 PM David Vossel <dvo...@redhat.com> wrote:Hey,--We experienced something interesting today. If the majority of a cluster goes offline (lets say as a result of a power failure) and all the nodes come back online at the same time, every VM with runStrategy=Always gets started at pretty much the same time. For very large clusters, this massive start event causes a strain on the control plane components, network, storage, and just about everything else. Due to this strain, we start seeing time outs occur during VMI startup, which causes even more VMI's to get re-created to satisfy runStrategy=Always.This has me thinking about two things.1. I think we need to introduce a VM crashloop backoff for VM's with runStrategy=Always that continue to crash before ever having their VMI hit phase=Running.2. I think we should consider a max queue len for how many VMIs we allow to be in the startup state before phase=Running.Item 1 would prevent our control plane from thrashing during a VMI crash loop and item 2 would prevent the control plane from thrashing during a massive startup event.Has anyone else given these types of scenarios any thought? Do these seem like reasonable approaches?Thanks,- David
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAPjOJFsyD6eexGAoJ00giBiRSGcVzHikSe0Bn0avNo-K2CbhEQ%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAO_S94ghUR%3DGB6Rx-Pt3CYagRcX%3DQ3ahfJCjdgCWFm4uBQOg%3DQ%40mail.gmail.com.
Hi Zang!On Thu, Jun 17, 2021 at 12:51 AM 'Zang Li' via kubevirt-dev <kubevi...@googlegroups.com> wrote:HI David,I just read what the runStrategy is and I am quite confused on its usage. May I ask for some clarifications here?The API comment says runStrategy is mutually exclusive with the running flag.They're mutually exclusive simply because RunStrategy completely eclipses Running--thus it could be possible to issue contradictory directives if both were allowed at the same time.1. If I set running to true, a vmi will be able to start and stop - is that corresponding to the manual strategy?No. Running=True is equivalent to RunStrategy=AlwaysWe want to move away from "Running" for two reasons.First off, it's confusing. Running in the spec is a request for a state. It's not a guarantee that the VMI can actually run (e.g. resource request can't be filled by any running node). People have seen the word "Running" and assumed that that's a true reflection of the state of the VM.Secondly, it turned out we needed more rich policies in some cases, like RerunOnFailure. Some users prefer to shut down the VMI from inside the guest and found it confusing when it came back online immediately.2. If I set running to true, and in your situation, will they all be brought up again after the nodes come back?Yes.3. For the runStrategy RunStrategyRerunOnFailure, what failure are we talking about here? Are we continuously monitoring vmi failure, or just the exit code?Just the exit code. As qemu is continuously monitoring the state of the guest, we can trust the exit code reflects what happened--and we don't need to react until the guest is offline anyways.As for the issue you see, is VMI recreation handled by a virt-controller?Yes.Would some rate limiting at the virt-controller processing queue help?We still want to react to cluster state changes as quickly as possible. A backoff should only be introduced for crashed VMIs to lessen the impact of a restart storm.