Dynamic tasks in DAG causing "502 Server Error"

2,010 views
Skip to first unread message

Akshay Iyengar

unread,
May 7, 2018, 1:46:33 PM5/7/18
to cloud-composer-discuss
I wrote up a simple dag as follows:

from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime, timedelta

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2018,5,5),
'email': ['air...@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'schedule_interval': timedelta(1),
# 'end_date': datetime(2016, 1, 1),
}

dag = DAG('tutorial', default_args=default_args)

sample_command = """ echo Call no. {{ params.test_param }} """

for i in range(1, 100):
t1 = BashOperator(
task_id='print_date_'+str(i), bash_command=sample_command, params={'test_param': i}, dag=dag)

Not the prettiest DAG, but it was to see how well a dynamic task would work. This works well when the range is 100 or 1000. However, making it 10000 instantly crashes the UI and triggers a "502 Server Error", which rights itself only when the code is replaced with a smaller range. While not intended to be a stress test, I was trying to see if a 4 node n1-standard-4 instance cluster would be able to handle this. Apparently not.

Maybe, as part of the documentation, you could include instance recommendations (sizes and no. of nodes) based on the types of tasks that the user wants to run? In the meantime, I'll see if adding more nodes would help. I'll update this thread with my findings.

Akshay Iyengar

unread,
May 7, 2018, 2:27:05 PM5/7/18
to cloud-composer-discuss
Upping this to 8 and then 16 n1-standard-4 nodes has had no effect when the range is set to 10000.

Wilson Lian

unread,
May 7, 2018, 2:35:20 PM5/7/18
to Akshay Iyengar, cloud-composer-discuss
Hi Akshay,

All Airflow microservices, including the webserver parse your DAG definition files. I suspect that you're hitting an OOM error in the webserver, which is currently configured with 1GB of RAM. The webserver machine resources don't scale up with the machineType config, which only affects the scheduler and worker nodes. This is a known issue, and we're working towards a fix that will make the webserver more reliable. It might be the case that the scheduler and workers are suitably sized to handle the larger dynamic DAG. You can find out by checking your Composer Environment Stackdriver logs for the scheduler node to see if the scheduler reports successfully parsing the DAG.

best,
Wilson

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-composer-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/0de5ed78-faea-499e-85b4-8ab2a5d79bcc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Akshay Iyengar

unread,
May 7, 2018, 3:30:32 PM5/7/18
to cloud-composer-discuss
I see. So a completely separate machine hosts the webserver? Or 1 GB is all that is dedicated to the webserver on the machine on which it is installed? I followed your suggestion to look at the logs. I reloaded the DAG with the range set to 10000. It appears to parse correctly and progress along (though, understandably, slowly). The webserver, however, seems to have reached a zombie state :)

Is there an issue tracker where we can see (and possibly report) these issues?

Thanks for your help!
Akshay
To post to this group, send email to cloud-compo...@googlegroups.com.

Akshay Iyengar

unread,
May 7, 2018, 3:40:02 PM5/7/18
to cloud-composer-discuss
Spoke too soon. The scheduler looks to be moving along well, except that the worker logs show nothing. No task is actually getting processed. What might be going on?

Akshay Iyengar

unread,
May 7, 2018, 4:15:43 PM5/7/18
to cloud-composer-discuss
On a related note, can we see the Celery UI through Flower as well? Is there a plan to expose this? Or can Flower be installed separately through PyPi and set up by us?

Wilson Lian

unread,
May 7, 2018, 5:27:25 PM5/7/18
to Akshay Iyengar, cloud-composer-discuss
Hi Akshay,

The webserver is hosted on a separate AppEngine machine that's not part of the Kubernetes Engine cluster. Until we have the scaling improvement in place for the AppEngine webserver, you can follow these instructions to launch a self-managed webserver in the Kubernetes Engine cluster itself: https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver

There isn't a public issue tracker in the way an open source project might have one.

It's hard to tell what's going wrong with the workers, but the "GKE Container" resource logs should show more low-level details than the "Cloud Composer Environment" resource logs and are worth checking out.

You can install flower as a pypi package (https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies) and expose it via Kubernetes port forwarding. The steps to expose it are similar to the steps for deploying a self-managed webserver. 

best,
Wilson

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsubscri...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.

Wilson Lian

unread,
May 7, 2018, 7:13:21 PM5/7/18
to Akshay Iyengar, cloud-composer-discuss
Hi Akshay,

Actually, the self-managed webserver instructions are a tad overkill for a one-off Flower debugging session. Once you've installed flower as a PyPI package, you can follow the steps below on your local machine:
  1. Determine the Cloud Composer environment's Kubernetes Engine cluster
  2. Connect to the Kubernetes Engine cluster
  3. Select a worker (or scheduler) pod (looks like "airflow-worker-[-a-f0-9]+"): kubectl get pods
  4. Run flower on the selected worker pod: kubectl exec -it POD_NAME_FROM_ABOVE -c airflow-worker -- flower --broker=redis://airflow-redis-service:6379/0 --port=5555
  5. In a separate, parallel session, forward local port to flower: kubectl port-forward POD_NAME_FROM_ABOVE 5555
  6. Visit http://localhost:5555/ in your local browser.
best,
Wilson

On Mon, May 7, 2018 at 2:27 PM, Wilson Lian <wwl...@google.com> wrote:
Hi Akshay,

The webserver is hosted on a separate AppEngine machine that's not part of the Kubernetes Engine cluster. Until we have the scaling improvement in place for the AppEngine webserver, you can follow these instructions to launch a self-managed webserver in the Kubernetes Engine cluster itself: https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver

There isn't a public issue tracker in the way an open source project might have one.

It's hard to tell what's going wrong with the workers, but the "GKE Container" resource logs should show more low-level details than the "Cloud Composer Environment" resource logs and are worth checking out.

You can install flower as a pypi package (https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies) and expose it via Kubernetes port forwarding. The steps to expose it are similar to the steps for deploying a self-managed webserver. 

best,
Wilson
To post to this group, send email to cloud-composer-discuss@googlegroups.com.

Brian Barnes

unread,
Jul 31, 2018, 1:44:24 PM7/31/18
to cloud-composer-discuss
Similar issue. I have about 500MB of python dependencies, webserver fails with 502. ETA when this will be fixed?


On Monday, May 7, 2018 at 4:13:21 PM UTC-7, Wilson Lian wrote:
Hi Akshay,

Actually, the self-managed webserver instructions are a tad overkill for a one-off Flower debugging session. Once you've installed flower as a PyPI package, you can follow the steps below on your local machine:
  1. Determine the Cloud Composer environment's Kubernetes Engine cluster
  2. Connect to the Kubernetes Engine cluster
  3. Select a worker (or scheduler) pod (looks like "airflow-worker-[-a-f0-9]+"): kubectl get pods
  4. Run flower on the selected worker pod: kubectl exec -it POD_NAME_FROM_ABOVE -c airflow-worker -- flower --broker=redis://airflow-redis-service:6379/0 --port=5555
  5. In a separate, parallel session, forward local port to flower: kubectl port-forward POD_NAME_FROM_ABOVE 5555
  6. Visit http://localhost:5555/ in your local browser.
best,
Wilson
On Mon, May 7, 2018 at 2:27 PM, Wilson Lian <wwl...@google.com> wrote:
Hi Akshay,

The webserver is hosted on a separate AppEngine machine that's not part of the Kubernetes Engine cluster. Until we have the scaling improvement in place for the AppEngine webserver, you can follow these instructions to launch a self-managed webserver in the Kubernetes Engine cluster itself: https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver

There isn't a public issue tracker in the way an open source project might have one.

It's hard to tell what's going wrong with the workers, but the "GKE Container" resource logs should show more low-level details than the "Cloud Composer Environment" resource logs and are worth checking out.

You can install flower as a pypi package (https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies) and expose it via Kubernetes port forwarding. The steps to expose it are similar to the steps for deploying a self-managed webserver. 

best,
Wilson
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

Maria Janczak

unread,
Aug 2, 2018, 7:04:07 PM8/2/18
to bg...@cornell.edu, cloud-compo...@googlegroups.com
Hi Brian,
Generally, a 502 error is a result of a bad DAG being used. If you can give me your webserver URL, I can try to help pinpoint the problem.
-Maria

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

Brian Barnes

unread,
Aug 3, 2018, 2:24:42 PM8/3/18
to Maria Janczak, cloud-compo...@googlegroups.com
https://e2cda2c096051fa8c-tp.appspot.com/admin/#

I am no longer getting 502s, but am still having trouble if you're willing to take a look. I'm unsure if behavior is non-deterministic or if it just takes a long time to reload dags.

Right now my dag is not showing up in the webserver UI, despite being located in the top level of the dags folder. At this point I'm not really sure how to proceed. Have not been able to glean anything useful from Stackdriver.

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-composer-discuss@googlegroups.com.

Brian Barnes

unread,
Aug 3, 2018, 3:46:26 PM8/3/18
to Maria Janczak, cloud-compo...@googlegroups.com
For example:

error message: Broken DAG: [/home/airflow/gcs/dags/cloud_dag.py] No module named roboflow.preprocessing

dags/
  cloud_dag.py
  roboflow/
    __init__.py
    preprocessing/
      __init__.py
      preprocess.py

Shouldn't this work?

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsubscri...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsubscri...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsubscri...@googlegroups.com.

Feng Lu

unread,
Aug 9, 2018, 3:07:35 AM8/9/18
to bg...@cornell.edu, Maria Janczak, cloud-composer-discuss
Hi Brain,

Is this still a problem for you? Looking at your directory structure, I would expect the following to work: import roboflow.preprocessing. 

Feng 

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.

To post to this group, send email to cloud-compo...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages