Configuring :onyx/tenancy-id, :job-scheduler and :zookeeper/address

Anuj Kumar

unread,

Jul 31, 2016, 12:00:55 PM7/31/16

to onyx...@googlegroups.com

Hi,

I have recently started exploring Onyx (0.9.9) and was going through Onyx starter project to understand the basics. I have couple of doubts, w.r.t. configuration parameters and their placements-

If we have already specified the :onyx/tenancy-id in the environment configuration why do we need to specify the :onyx/tenancy-id again for peer configuration? Shouldn't we reuse environment configurations?
Similarly, for :job-scheduler, we specify once in the peer configuration and once with the job definition that eventually overrides the peer configuration.
Similarly, we specify :zookeeper/address at both environment and peer level

Just curious to know the use case for which these are permitted.

References

http://www.onyxplatform.org/docs/cheat-sheet/latest/#env-config/:onyx/tenancy-id

http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx/tenancy-id

http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.peer/job-scheduler

https://github.com/onyx-platform/onyx-starter/blob/0.9.x/src/onyx_starter/launcher/submit_sample_job.clj#L24

http://www.onyxplatform.org/docs/cheat-sheet/latest/#env-config/:zookeeper/address

http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:zookeeper/address

Thanks,

Anuj

Mike Drogalis

unread,

Jul 31, 2016, 4:33:06 PM7/31/16

to Anuj Kumar, Onyx

Hello! These are good questions!

Generally, the development environment and the peers have separate configurations because of their purpose.

First, the dev env. The development environment is a set of in-memory components that typically run out-of-process -- for instance, ZooKeeper, BookKeeper, and Aeron. You'd only use the development environment during dev and testing. You have to supply things like addresses and ports to the dev env because it does in fact start real implementations of those mentioned services.

On the other hand, the peers have their own configuration because they are designed to be completely ignorant of when they are running locally, and when they are running in a distributed environment. It wasn't built to infer when it's running locally and pick up the dev env settings.

To answer the specifics (out of order):

- Similarly, we specify :zookeeper/address at both environment and peer level

The dev env starts a ZooKeeper server at the given address. The peer starts a ZooKeeper client connection to the given address. The addresses match because one would presumably want the peer ZK client to connect to the env ZK server.

- Similarly, for :job-scheduler, we specify once in the peer configuration and once with the job definition that eventually overrides the peer configuration.

I actually don't see any instances of this occuring. If you can point me to a spot where we're doing this, I'm 99.9% sure it's unnecessary or old. I did see one instance of a scheduler in your link here, but that's for a task scheduler, not a job scheduler, and it is being used at job submission time -- which is separate from both scenarios being discussed here.

- If we have already specified the :onyx/tenancy-id in the environment configuration why do we need to specify the :onyx/tenancy-id again for peer configuration? Shouldn't we reuse environment configurations?

Having the tenancy-id in the development environment is useful in the event that you're using a combination of in-memory and out-of-process services, like in-memory ZooKeeper, and out-of-process BookKeeper. Admittedly that's pretty rare and unusual, but it's helpful as the developers of Onyx to be able to introspect into a tenancy-id for multiple running dev envs.

Does that help? Happy to continue clarifying. Good questions!

--
You received this message because you are subscribed to the Google Groups "Onyx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+...@googlegroups.com.
To post to this group, send email to onyx...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/onyx-user/CAGUDQMxqW4euV9BoQ3xJt7A47CxRc24FCkb4iGUwujqoj7%2BrLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Anuj Kumar

unread,

Jul 31, 2016, 10:38:18 PM7/31/16

to Mike Drogalis, Onyx

Thanks Mike. Those are spot-on answers.

On the job and task scheduler side, you are right, those two are different that leads to a follow-up clarification on job scheduling vs task scheduling. How are these used at different stages of a job?

Regards,

Anuj

Mike Drogalis

unread,

Jul 31, 2016, 11:06:14 PM7/31/16

to Anuj Kumar, Onyx

Job schedulers figure out which peers get allocated to which jobs when more than one job is running simultaneously. Once the peers are bucketed into the jobs they will work on, it's up to the task schedulers to figure out which task each peer will work on for its respective assigned job.

Anuj Kumar

unread,

Jul 31, 2016, 11:07:47 PM7/31/16

to Mike Drogalis, Onyx

Thanks Mike.

Reply all

Reply to author

Forward