can azkaban scheduler jobs in a flow on different executor?

168 views
Skip to first unread message

4133...@qq.com

unread,
Nov 5, 2017, 5:58:54 AM11/5/17
to azkaban
such as i have a flow:
  job C dependency  job B, job B dependency job A      graph is   C------>B------->A
 how can i scheduler it with jobC on executor1 , jobB  on  executor2 and  jobA on executor1

英文不是很好,以下是中文
   azkaban的cluster server mode 模式下,是否支持一个flow内,不同的job调度到不同的executor上去执行
  比如有一个flow,jobC依赖jobB,jobB依赖jobA,我是否可以通过配置来设置jobC在executor1上执行,jobB在executor上执行,jobA在executor3上执行

Ray Yang

unread,
Nov 13, 2017, 11:22:20 AM11/13/17
to azkaban
No. 

What's your use case? 

ke...@remitly.com

unread,
Mar 16, 2018, 1:14:28 PM3/16/18
to azkaban
I'm not sure about their use case, but we have a similar ask for our schedulers.  We have a bunch of data processing jobs in a workflow, and some need to be run on an EMR cluster in AWS.  The submission tools are on the EMR master node.  There's a few options we can take to work around, but the one we're using is to SCP the files from the Azkaban executor onto the EMR master and then SSH to dispatch the jobs.  This isn't ideal from an ops perspective, but a lot of the alternatives are worse (i.e. copying EMR config to executor makes cluster upgrades a pain, running executor on master means all jobs run on master).  Having executor groups and being able to pin jobs to certain groups would allow us to host an executor on the EMR master and dispatch only EMR jobs to it.

Likewise, we've thought about a scheduler-as-a-service situation internally, and it'd be nice to have executor groups that you can pin entire workflows to internally (i.e. Team A's workflow only runs on Team A's servers which has Team A's tooling, etc)

Ray Yang

unread,
Mar 17, 2018, 11:04:32 AM3/17/18
to azkaban
Thanks for sharing your use cases.

Would a job type that can launch programs on a remote host satisfy your needs?

At Linkedin, we offer Azkaban as a service. Most of the jobs run on Hadoop. There is currently no strong need to partition executors.

We do run multiple separate Azkaban clusters for fault isolation and to apply different policies though. We plan to improve the reliability and scalability of Azkaban so that we can run larger Azkaban clusters and to be able to connect to multiple Hadoop clusters and other execution backends using one Azkaban cluster in the future. 
Reply all
Reply to author
Forward
0 new messages