Greetings,
I have been on and off encountering this
known issue in Composer. In the process I noticed that my Composer GKE cluster, which is set to autoscale 1-6, never seems to scale down below 6 nodes, at n1-standard-1.
Running some commands (kubectl top nodes), I notice that each Node seems to always hover at around 70% RAM usage even idle (i.e. no DAGs are running)!
Investigating further, I narrowed down the high RAM usage to mostly the airflow-worker deployment, which all together at 6 pods takes 8 gb RAM (airflow-scheduler is also quite hungry, but only at 600 mb total).
Once I logged into a pod belonging to the deployment, I identified the culprit: there are very many celeryd subprocesses that each use about 119,96 mb of RAM. These are launched with the command of the form:
[celeryd: celery@airflow-worker-776fb6f5b7-sl8h2:ForkPoolWorker-7]
In each airflow-worker pod there can be around a dozen of these subprocesses at any given time, if not more, all eating memory, in Cloud Composer GKE cluster that is backing a mostly dormant Airflow deployment, in terms of DAGRuns. What is going on?