Self-Managed, GCP Alternatives to Composer

775 views
Skip to first unread message

Ethan Lyon

unread,
Jun 12, 2019, 7:52:10 AM6/12/19
to cloud-composer-discuss
Hey Folks,

I've been having issues with Composer failing without errors in both Stackdriver Logs and in the webserver UI. It's been a roadblock for weeks -- burning time and money -- and now we need to start looking for alternatives. It might be a Composer bug but there isn't support for this type of issue.

I'm looking for a self-managed alternative so I can get a better understanding of what's happening with the instance. Does anyone have experience managing their own Airflow instance on GCP?

I've found this article but before I go too far down the self-managed path, I'd love to hear others opinions / solutions for a GCP, Composer alternative.

Thanks in advance,

- Ethan

Sean Davis

unread,
Jun 12, 2019, 9:47:41 AM6/12/19
to Ethan Lyon, cloud-composer-discuss
If you do not need to scale out (can use a LocalExecutor), setting up on a single GCP VM should be fine. I actually run Airflow on my laptop for testing and low-resource jobs in exactly the same way that you would on a GCP VM. If you start having to scale out with additional resources (VMs), things get more complicated with Airflow infrastructure and management, for sure. That said, GCP has nothing to do with that complication.

Sean


--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/CAHSbcs8gM90eV%3Di2B4kDkZpuiRXdrH6KX6hqwK4%2Bb67qowpncQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--
Sean Davis, MD, PhD
Center for Cancer Research
National Cancer Institute
National Institutes of Health
Bethesda, MD 20892

Ethan Lyon

unread,
Jun 12, 2019, 4:32:37 PM6/12/19
to Sean Davis, cloud-composer-discuss
That's a great idea Sean. Are you increasing the size of the VM to deal with smaller steps in scaling? Curious where there local executor breaks down.

We really don't do that much processing. 10 DAGs max with light-weight processing.

Sean Davis

unread,
Jun 12, 2019, 9:41:49 PM6/12/19
to Ethan Lyon, cloud-composer-discuss
You'll have to think through what concurrency and compute you need to support. Airflow doesn't do magic with respect to managing resources, so you'll have to look at each of your DAGs to determine maximum requirements and then look at whether DAGs ever need to run simultaneously. Finally, you may want to look into configuration of Airflow to protect your setup a bit (reducing concurrency, etc.).

Sean



ilker karapanca

unread,
Jun 13, 2019, 12:12:26 AM6/13/19
to cloud-composer-discuss
Hi Ethan,
I have used airflow with docker for the last 5 months on a server.

I think Sean mentioned about concurrency, This is important otherwise containers will eat all resources on host machine, took sometime to figure out for me.

Another thing is if you do backfilling airflow can run the dag simultaneously for different execution dates, to limit this usage set max_active_runs=1 in the dag definition, only one execution will run at a time for that dag.

Finally for each dag you can set execution times differently.To make it more elegant you can use airflow pools, I haven't used them but I think It is right place to look to limit dag runs at the airflow level.


If you use docker do not forget to persist the airflow database.

Regards,
Ilker

Andre Fernandes

unread,
Jun 13, 2019, 7:08:00 AM6/13/19
to cloud-composer-discuss
I also had a lot of issues with Composer.

My move to go with a single self managed instance was the best thing I could have done. It was not terribly difficult to configure the LocalExecutor either. The main issues I had were setting the database and setting authentication so the variables are not exposed.

A couple of advantages of having a bigger instance instead of several instances on Kubernetes:
- a bigger instances means better network bandwidth
- management of computational resources is a lot easier (bird's eye view)
- no issues with logging when a remote tasks fails (i.e. OOM that kills worker pods)

Rick Otten

unread,
Jun 13, 2019, 4:29:42 PM6/13/19
to cloud-composer-discuss
It isn't self-managed, but http://astronomer.io is a competitor to gcp composer, also running airflow.  I've never tried them, but have heard good things, and they have really good documentation.

We ran our own airflow instance for a few years.  It is a significant hassle to get it to work.  At least as of a year ago it did not run "out of the box" without a lot of tweaking.   I thought it was enough of a challenge to get it working well that it was worthy of a local conference talk, but none of the local conferences were interested, and I've lost my notes on it since then.  (they are probably outdated anyhow).
Reply all
Reply to author
Forward
0 new messages