Firing Docker based ETL Operator from composer airflow

217 views
Skip to first unread message

Maxim Veksler

unread,
Jul 5, 2018, 4:03:37 AM7/5/18
to cloud-composer-discuss
Hi,

Exploring Composer, I wonder what's the community thoughts on the following:

Our ETL pipeline is build on Dockerized steps, each one started with some parameter and a default CMD.

Having composer run this pipeline I'm wondering how should I get this docker operation to run from within composer airflow.

Several options pop to mind,

  1. Get a dedicated Kubernetic cluster, and use BashOperator to run kubectl commands.
  2. Get composer to run the docker on the kubernetics cluster is it using for the worker nodes themselves.
  3. Explore Bloombergs latest efforts on Airflow Kubernetics Operator https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/
  4. Any other pragmatic ideas?


>M.

ma...@iihnordic.com

unread,
Jul 6, 2018, 5:41:19 AM7/6/18
to cloud-composer-discuss
I'm interested in other replies as we have similar needs, but what we are doing currently is using another Kubernetes cluster and hooking the containers up to HTTP endpoints, then using Airflow to call via the HTTP operator.

Feng Lu

unread,
Jul 9, 2018, 3:28:45 AM7/9/18
to ma...@vekslers.org, cloud-composer-discuss
Hi Maxim, 

Please see my reply inline: 

On Thu, Jul 5, 2018 at 1:03 AM Maxim Veksler <ma...@vekslers.org> wrote:
Hi,

Exploring Composer, I wonder what's the community thoughts on the following:

Our ETL pipeline is build on Dockerized steps, each one started with some parameter and a default CMD.

Having composer run this pipeline I'm wondering how should I get this docker operation to run from within composer airflow.

Several options pop to mind,

  1. Get a dedicated Kubernetic cluster, and use BashOperator to run kubectl commands.
+1 for the idea of a dedicated k8s cluster, but bash operator might be a bit hard to use (e.g., you have to add additional logic to read back k8s pod logs).
The Composer team just submits a PR for running k8s pod in any GKE cluster (https://github.com/apache/incubator-airflow/pull/3532). 
  1. Get composer to run the docker on the kubernetics cluster is it using for the worker nodes themselves.
We are backporting KubernetesPodOperator to Composer which allows one to launch pod in the same worker cluster. 
Please be aware of potential resource contention issues when running actual work inside the same cluster. 
  1. Explore Bloombergs latest efforts on Airflow Kubernetics Operator https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/
This requires architecture changes inside Composer, we are working on this but no concrete ETA yet (it also depends on when 1.10 official release).  
  1. Any other pragmatic ideas?
I probably would recommend option 1 with GKEPodOperator

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/5e9e9f24-650a-481a-830f-df485599684d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages