Final update to save the next guy or gal some headaches:
- istio/envoy will shut down upon a pod termination request. So, any syncs across nodes will not succeed, if they take longer than the shutdown
- if envoy takes longer to be ready than it takes RMQ to start reaching out to other nodes in the cluster, those reachouts will fail
Both of these problems are related to how k8s handles sidecars: there's no concept of an interdependency between containers in the pod. That may be fixed in the newest versions of k8s (1.28 and beyond).
Resolution in the meantime: we added overrides to the RabbitMqCluster CRD to add configurations for istio proxy, per what we saw in this briefing at istio-con from 2021:
Istio is a long wild river: how to navigate it safely | IstioCon 2021 (Credit to Raphael Fraysse and the team at Mercari.)
- holdApplicationUntilProxyStarts = true: we did not add it to the meshConfig, which would have impacted our full cluster, but instead added it as an annotation to the RabbitMqCluster via the annotation
proxy.istio.io/config. This can be set by adding it to `override.statefulSet.spec.template.metadata.annotations`
- added a preStop hook to the istio-proxy container which ensures that it is stopped after every other container in the pod. The preStop hook was used as stated in the slides and applied as part of the `override.statefulSet.spec.template.spec.containers`, with a container definition created for istio-proxy that solely contained the lifecycleHook for preStop. All other settings for istio-proxy got merged in appropriately without being specified in the CRD.
Again, in our case, hese were all added to the RabbitMqCluster CRD instances. If you are seeking to make kubernetes cluster-wide changes to how istio is configured, using the meshConfig options as listed in the slide deck may be of use to you. My scope was just working through the issues for our RabbitMQ clusters.