We recently upgraded Cloud Composer from image
composer-2.8.6-airflow-2.6.3 to
composer-2.13.5-airflow-2.10.5.
Right after upgrading, we started to observe regular eviction of airflow-redis pod, which causes airflow-scheduler to be restarted since it can't connect to airflow-redis:
> future: <Future finished exception=OperationalError('Error 111 connecting to airflow-redis-service.composer-system.svc.cluster.local:6379. Connection refused.')>
Previously (v2.8.6), airflow-redis had a PDB that prevented the cluster autoscaler from evicting airflow-redis:
{
"kind": "PodDisruptionBudget",
"apiVersion": "policy/v1",
"metadata": {
"name": "airflow-redis-pdb",
"namespace": "composer-system",
"uid": "16ebc83a-68cf-49ff-8de4-54b30ba98698",
"resourceVersion": "1748750859365743017",
"generation": 1,
"creationTimestamp": "2025-02-17T18:17:55Z"
},
"spec": {
"selector": {
"matchLabels": {
"run": "airflow-redis"
}
},
"maxUnavailable": 0
}
}
In the new version, this PDB is missing:
$ kubectl -n composer-system get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
airflow-database-init-job-pdb N/A 0 0 146d
airflow-sqlproxy-pdb N/A 1 0 146d
composer-agent-pdb N/A 0 0 146d
composer-snapshots-pdb N/A 0 0 146d
Here's a cluster autoscaler log entry:
{
"insertId": "1ae29c21-f9af-4fb8-b3f4-0f6e0eb51515@a1",
"jsonPayload": {
"decision": {
"eventId": "533694d2-4ea2-4ff1-8016-9069e2a89764",
"scaleDown": {
"nodesToBeRemoved": [
{
"node": {
"cpuRatio": 66,
"mig": {
"nodepool": "nap-xxx",
"zone": "xxx",
"name": "xxx"
},
"memRatio": 62,
"name": "xxx"
},
"evictedPodsTotalCount": 12,
"evictedPods": [
{
...
},
{
"namespace": "composer-system",
"name": "airflow-redis-0",
"controller": {
"name": "airflow-redis",
"apiVersion": "apps/v1",
"kind": "StatefulSet"
}
},
{
...
}
]
}
]
},
"decideTime": "1752135241"
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "xxx",
"location": "xxx",
"cluster_name": "xxx"
}
},
"timestamp": "2025-07-10T08:14:01.666488867Z",
"logName": "projects/xxx/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2025-07-10T08:14:02.170197809Z"
}
Is it a regression?