Prod Incident : Airflow environment health - unhealthy

58 views
Skip to first unread message

Debodirno Chandra

unread,
Jul 25, 2024, 6:42:10 AM7/25/24
to cloud-composer-discuss
Hello everyone,

Need some urgent guidance how to avoid this in future for such a long duration.
1.png


2.png

3.png


4.png

Airflow worker died because of high disk usage, however, no new worker came up for a very long time and it affected our production.

Any idea why this could have happened? How can we mitigate this? As in, if one worker dies, the other comes up real fast.

We had to manually change the environment config (worker memory) and saw that a new worker came up with that change.

Any pointers?
Reply all
Reply to author
Forward
0 new messages