Partitions when rejoining cluster when istio is in use

Tina Coleman

unread,

Oct 5, 2023, 8:39:31 AM10/5/23

to rabbitmq-users

Scenario: 3 node RabbitMQ cluster, deployed using the RabbitmqCluster operator inan on-premise k8s cluster which has istio enabled using sidecar automatic injection via namespace annotation. The operator by default sets up a pause_minority partition management strategy

Challenge: For certain cluster environments (local, integration) which have small numbers of services and pods, there is no challenge: the cluster comes up well and is able to rolling restart readily whether we upgrade rabbitmq, change the memory granted to the pods, etc. For larger clusters (staging, production), those same rolling restarts cause partitions.

For those larger clusters, the time for the istio-proxy to mark itself as ready is longer. We've tried to optimize it using a SidecarConfig configuration. That's taken it from ~15 seconds (!) to more like 6-10, but we're still seeing issues. Just as a test validation, we were able to deploy a similar rabbitmq cluster in staging such that istio did not apply (namespace did not have the annotation to support automatic sidecar injection; validated that no istio-proxy sidecar was injected) and the rolling restart wasn't an issue. So we're thinking the issue is that proxy standup time.

We think we're missing a configuration setting somewhere that would allow the nodes to wait before attempting to rejoin an existing cluster. We've tried `delayStartSeconds`, as well as looked for `net_tick_timeout messages`. Other angles folks have used or would suggest? Something in Erlang? cluster_formation? An istio setting?

(Note: shoutout to tgir: I'm working to replicate their ChaosMesh talk from episode 9 on the clusters that _don't_ evidence the behavior to see if I can cause it in those settings to allow me to more rapidly try out potential resolutions. That talk also gave us a way to evidence partitions in Grafana for operational visibility - also highly useful.)

Tina Coleman

unread,

Oct 6, 2023, 3:30:11 PM10/6/23

to rabbitmq-users

Theory:

When istio is in use, traffic goes out through an envoy sidecar. That sidecar shuts down when the pod is terminated, and does so in ~5s

Wondering if RabbitMQ is trying to reach out to the other nodes in the cluster beyond the 5s and setting itself up for issues as the pod shuts down.

Update on ChaosMesh: couldn't get it to work in our cluster, but it looks like that's based on constraints in our cluster's configuration. Still highly recommending the tgir talk.

Tina Coleman

unread,

Oct 13, 2023, 5:01:13 PM10/13/23

to rabbitmq-users

Update on progress, in case it sparks any "Aha!" statements. When I kick this, will also post any resolution to spare the next guy or gal.

istio/envoy will shut down upon a pod termination request. I've been working with adding preStop and postStart hooks onto the istio-proxy. I've been able to wire them in via the RabbitMqClusterOperator. There are alternate means of wiring them in via istio itself, but I'm not yet ready to call this the resolution and update the istio-proxy across my full cluster. I'll be testing whether that resolves the issue on Monday.

Tina Coleman

unread,

Oct 23, 2023, 4:48:34 PM10/23/23

to rabbitmq-users

Final update to save the next guy or gal some headaches:

- istio/envoy will shut down upon a pod termination request. So, any syncs across nodes will not succeed, if they take longer than the shutdown

- if envoy takes longer to be ready than it takes RMQ to start reaching out to other nodes in the cluster, those reachouts will fail

Both of these problems are related to how k8s handles sidecars: there's no concept of an interdependency between containers in the pod. That may be fixed in the newest versions of k8s (1.28 and beyond).

Resolution in the meantime: we added overrides to the RabbitMqCluster CRD to add configurations for istio proxy, per what we saw in this briefing at istio-con from 2021: Istio is a long wild river: how to navigate it safely | IstioCon 2021 (Credit to Raphael Fraysse and the team at Mercari.)
- holdApplicationUntilProxyStarts = true: we did not add it to the meshConfig, which would have impacted our full cluster, but instead added it as an annotation to the RabbitMqCluster via the annotation proxy.istio.io/config. This can be set by adding it to `override.statefulSet.spec.template.metadata.annotations`

- terminationDrainDuration - also added to the annotation proxy.istio.io/config annotation

- added a preStop hook to the istio-proxy container which ensures that it is stopped after every other container in the pod. The preStop hook was used as stated in the slides and applied as part of the `override.statefulSet.spec.template.spec.containers`, with a container definition created for istio-proxy that solely contained the lifecycleHook for preStop. All other settings for istio-proxy got merged in appropriately without being specified in the CRD.

Again, in our case, hese were all added to the RabbitMqCluster CRD instances. If you are seeking to make kubernetes cluster-wide changes to how istio is configured, using the meshConfig options as listed in the slide deck may be of use to you. My scope was just working through the issues for our RabbitMQ clusters.

Reply all

Reply to author

Forward