Composer v2 webserver unstable

534 views
Skip to first unread message

Tobias Kaymak

unread,
Nov 15, 2021, 11:27:10 AM11/15/21
to cloud-composer-discuss
Hello,

we are migrating to Composer v2 from Composer v1 (1.10.15).

After creating a medium sized environment, adjusting our DAGs, and copying them and their plugins we see that the webserver is marked unstable and being restarted a couple of times. The webserver has enough resources in terms of RAM, CPU and disk space.

Is there anything we should look at? We already tried creating a new environment from scratch so far.

Screenshot 2021-11-15 at 17.25.04.png
image.png

Best,
Tobi

Uri Goldstein

unread,
Nov 15, 2021, 2:31:33 PM11/15/21
to cloud-composer-discuss
Hi Tobi,

I'm no expert on Airflow/Composer but perhaps you can try looking at the health check that precedes the restarts?
From the log you posted it looks like something is "curling" /_ah/Health successfully (?) but 2 seconds later a hang-up signal is sent to make the webserver restart.
Maybe a closer look at Airflow's health check feature could show you why it think the webserver is unstable?

Cheers,
Uri

raphael auv

unread,
Nov 16, 2021, 6:02:37 AM11/16/21
to cloud-composer-discuss

Shyam Ashar

unread,
Nov 16, 2021, 11:03:42 AM11/16/21
to raphael auv, cloud-composer-discuss
Try to increase the webserver workers. By default it is 2 and if you have a lot of DAGs the webserver becomes very unstable. The guincorn documentation suggests that the workers should be 2x + 1 the number of cores. As a managed service Cloud Composer should ideally come with these settings but looks like they dont do it out of the box which is disappointing.



--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/32f112dd-0d9f-4e4f-83d5-57dbf047925an%40googlegroups.com.

Tobias Kaymak

unread,
Nov 17, 2021, 4:16:06 AM11/17/21
to cloud-composer-discuss
Thank you for your responses.

We have only 39 DAGs and I just tried setting the number of webserver workers to 3 and 4 - with no effect on the webserver health.
I have the feeling that this is a bug in the Composer v2 setup itself but I am out of ideas where to look.

Best,
Tobias
Screenshot 2021-11-17 at 10.13.27.png

Tobias Kaymak

unread,
Nov 17, 2021, 4:19:42 AM11/17/21
to cloud-composer-discuss
What I also found is that for the webserver proxy there seems to be a misconfiguration that I can't change:
Screenshot 2021-11-17 at 10.18.36.png

Tobias Kaymak

unread,
Nov 17, 2021, 8:09:05 AM11/17/21
to cloud-composer-discuss
I digged deeper and I am pretty sure this is a misconfiguration of the v2 build:
I disabled the healthyness and readyness probes on the webserver pod in the GKE cluster running composer v2.
Then I still saw the following messages in the logs:

Screenshot 2021-11-17 at 14.04.32.png
This IP (10.63.129.6) is belonging to the airflow-monitoring pod in the composer-system namespace. It's poking and then 2 seconds later the HUP signal is being received by gunicorn.

Investigating on the monitoring pod is not so easy as its secured through GKEAutopilot's authz.

Tobias Kaymak

unread,
Nov 25, 2021, 2:59:45 AM11/25/21
to Tobias Kaymak, cloud-composer-discuss
Further finding: The gcs-sync sidecar pod seems to cause the webserver to error-out when it's cleaning the pyc files, which looks like a race-condition:
[The example file here is a plugin from the plugin folder that's being cleaned by the gcs-sync sidecar, then the webserver pod can't find it and shuts down]
Removing file:///home/airflow/gcs/plugins/operators/__pycache__/trigger_emarsys_event_operator.cpython-38.pyc
{webserver_command.py:217} ERROR - [Errno 2] No such file or directory: '/home/airflow/gcs/plugins/operators/__pycache__/trigger_emarsys_event_operator.cpython-38.pyc'
{webserver_command.py:218} ERROR - Shutting down webserver

Reply all
Reply to author
Forward
0 new messages