Use Case / Best practises Question

336 views
Skip to first unread message

Miro Karpis

unread,
Aug 31, 2018, 12:34:45 PM8/31/18
to cloud-composer-discuss
I'm trying to get some estimates, for my workload. If I understand it correctly, then with default Composer setup I have 3 nodes with 'standard type' machine (3.75 GB). These nodes are running from the moment I create the environment until I delete it and not only when I use them (when a job is running), is this correct? Additionally to the VM prices comes pricing for the Composer.

So let's say I have a workload (set of jobs) which I want to run once a day, and it takes 2 hours for all them to complete. So if I want to save some money, the most optimum set-up would be be create a bash/python/cron job, that would:

  1. Create New Composer Environment (every day) - takes around 20 minutes to start
  2. Run the DAG (download, process, report,..)
  3. Delete the Composer Environment

Please, is this a way to go? Of course best would be to keep the Environment running and not care,...but that would be a waste.

Comments/ideas more then welcome :)

Cheers,
Miro

Tim Swast

unread,
Aug 31, 2018, 7:17:46 PM8/31/18
to Miro Karpis, cloud-composer-discuss
What would you schedule your 1. 2. 3. steps with?

One of the main benefits of Cloud Composer is seeing historical job runs and being able to retry and do backfills. You lose that if you recreate the environment each time.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/31873fd2-4010-443d-8e29-cc97249214c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
  •  Tim Swast
  •  Software Friendliness Engineer
  •  Google Cloud Developer Relations
  •  Seattle, WA, USA

Miro Karpis

unread,
Sep 1, 2018, 1:09:49 AM9/1/18
to cloud-composer-discuss
Hi Tim,
Agree that is the thing. I might be missing something here, but to keep running 3 nodes without any work load for let's say 80% of time is a bit waste of resources against the whole GCP 'way to go' (fire up quickly, scale quickly thing). For example what if I have a daily data import/pre-processing job which needs 50GB RAM and takes 2 hours? How would you implement this into Composer? Do i need to run 3x50GB RAM nodes 24/7?

Cheers,
m.

Miro Karpis

unread,
Sep 1, 2018, 2:29:44 AM9/1/18
to cloud-composer-discuss
Starting to understand this :). So for my use case above composer would fire a dataflow job, where I can define resources (ram, etc.). This way I actually in theory need only minimum Composer setup of 3x standard nodes which would fire my 2 hour 50gb ram dataflow job.
After checking the price, minimum Composer price of 3x standard nodes would be around 400$/month for running it 24/7 plus all additional costs for my dataflow jobs.

Does this correct?

Tim Swast

unread,
Sep 4, 2018, 12:17:16 PM9/4/18
to Miro Karpis, cloud-composer-discuss
After checking the price, minimum Composer price of 3x standard nodes would be around 400$/month for running it 24/7 plus all additional costs for my dataflow jobs.

Yes, about $400 per month is what I estimate for the Kubernetes cluster (workers + scheduler) plus the web core hours. For more details, you can review the pricing page: https://cloud.google.com/composer/pricing

I agree that this isn't the best use of resources if you're only using Composer to kick off one Dataflow job per day. Airflow doesn't fit well into a more efficient "serverless" model, yet, but it's something the team is thinking about.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Narayanan K

unread,
Oct 23, 2018, 2:01:03 AM10/23/18
to cloud-composer-discuss
I have a similar requirement for one of my project. We have mostly monthly jobs to run, but do like the nice GUI provided by Airflow.

Ideally Composer should run the Web Server, Scheduler and the Database 24 * 7 and kick start the workers if required.

Narayanan


On Wednesday, September 5, 2018 at 12:17:16 AM UTC+8, Tim Swast wrote:
After checking the price, minimum Composer price of 3x standard nodes would be around 400$/month for running it 24/7 plus all additional costs for my dataflow jobs.

Yes, about $400 per month is what I estimate for the Kubernetes cluster (workers + scheduler) plus the web core hours. For more details, you can review the pricing page: https://cloud.google.com/composer/pricing

I agree that this isn't the best use of resources if you're only using Composer to kick off one Dataflow job per day. Airflow doesn't fit well into a more efficient "serverless" model, yet, but it's something the team is thinking about.

On Fri, Aug 31, 2018 at 11:29 PM Miro Karpis <mirosla...@gmail.com> wrote:
Starting to understand this :). So for my use case above composer would fire a dataflow job, where I can define resources (ram, etc.). This way I actually in theory need only minimum Composer setup of 3x standard nodes which would fire my 2 hour 50gb ram dataflow job.
After checking the price, minimum Composer price of 3x standard nodes would be around 400$/month for running it 24/7 plus all additional costs for my dataflow jobs.

Does this correct?

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages