Deployment & scaling

David Collie

unread,

Aug 22, 2016, 5:36:24 AM8/22/16

to Onyx

Hi

After reading the docs it's not clear to me what the deployment and scaling strategy is for onyx. Is there some guidance I can read to understand this? I'm looking at onyx as an option for processing ~20TB of data but need to understand how to scale it.

Thanks

Dave

Mike Drogalis

unread,

Aug 22, 2016, 11:41:32 AM8/22/16

to David Collie, Onyx

Hi David,

Do you have any questions in particular? I assume you found these pages in the user Guide:

- Deployment

- Scheduling

Are you asking how, as a user, do you scale Onyx? You can add more peers at runtime, and work will transparently be dispersed among all available peers. If you're asking how this works under the hood, there's the Architecture chapter, and I did a talk in the spring that focused on the design.

--
You received this message because you are subscribed to the Google Groups "Onyx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+unsubscribe@googlegroups.com.
To post to this group, send email to onyx...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/onyx-user/e0ca9492-a30b-4516-809c-8e3441a22815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Collie

unread,

Aug 23, 2016, 6:05:13 AM8/23/16

to Onyx, dmco...@gmail.com

Mike

TL;DR I am an experienced Java programmer (with some clojure experience) but a big-data noob and I want to understand how to build a scalable Onyx cluster with new fangled container technologies.

Thanks for getting back to me. I have zero experience with Onyx or similar tools (Spark etc). Typically in the past we've built custom Java apps (running on Glassfish in virtualised but not containerised servers) to do this sort of data processing but I can see many advantages in using Onyx to provide all the 'plumbing' instead. So I'm currently completely ignorant on how to approach building systems the Onyx 'way'.

I understand about adding virtual peers and have read the documentation that you point to. What I would ideally like is some sort of best practice guide on how to build an Onyx cluster that can dynamically scale to meet demand. The docs do mention kubernetes very briefly as an option. I don't have any experience (yet) in kubernetes / docker / containers so it's hard for me to have confidence that any approach I might take will scale. Ideally you would have a reference architecture that I could refer to.

Any help appreciated. I know that these are noob questions.

Thanks

Dave

On Monday, 22 August 2016 16:41:32 UTC+1, Mike Drogalis wrote:

Hi David,

Do you have any questions in particular? I assume you found these pages in the user Guide:
- Deployment
- Scheduling

Are you asking how, as a user, do you scale Onyx? You can add more peers at runtime, and work will transparently be dispersed among all available peers. If you're asking how this works under the hood, there's the Architecture chapter, and I did a talk in the spring that focused on the design.

On Mon, Aug 22, 2016 at 2:36 AM, David Collie <dmco...@gmail.com> wrote:

Hi

After reading the docs it's not clear to me what the deployment and scaling strategy is for onyx. Is there some guidance I can read to understand this? I'm looking at onyx as an option for processing ~20TB of data but need to understand how to scale it.

Thanks

Dave

--
You received this message because you are subscribed to the Google Groups "Onyx" group.

To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+...@googlegroups.com.

Mike Drogalis

unread,

Aug 23, 2016, 11:37:50 PM8/23/16

to David Collie, Onyx

On Tue, Aug 23, 2016 at 3:05 AM, David Collie <dmco...@gmail.com> wrote:

Mike

TL;DR I am an experienced Java programmer (with some clojure experience) but a big-data noob and I want to understand how to build a scalable Onyx cluster with new fangled container technologies.

Thanks for getting back to me. I have zero experience with Onyx or similar tools (Spark etc). Typically in the past we've built custom Java apps (running on Glassfish in virtualised but not containerised servers) to do this sort of data processing but I can see many advantages in using Onyx to provide all the 'plumbing' instead. So I'm currently completely ignorant on how to approach building systems the Onyx 'way'.

I understand about adding virtual peers and have read the documentation that you point to. What I would ideally like is some sort of best practice guide on how to build an Onyx cluster that can dynamically scale to meet demand. The docs do mention kubernetes very briefly as an option. I don't have any experience (yet) in kubernetes / docker / containers so it's hard for me to have confidence that any approach I might take will scale. Ideally you would have a reference architecture that I could refer to.

Onyx is a particular general data processing platform, so we can't really point you to a single architecture with how to implement an application. Onyx is specifically extremely hands-off about how the cluster is set up, deployed, and managed. Other platforms typically force the issue and have you work around some kind of deployment abstraction. You don't have to use Onyx in a containerized way -- that's just the way most people happen to use it.

As far as having confidence about Onyx's ability to scale to handle more load as machines are added, I would continue to study Onyx's architecture, or deploy the benchmark to run a cluster in the cloud.

I don't really have a whole lot to say about how an Onyx cluster ought to look on Kubernetes, in AWS, or any other environment, but loads of production users are in the Clojurians Slack channel, or our Gitter channel, that can share their experiences.

Any help appreciated. I know that these are noob questions.

Thanks

Dave

On Monday, 22 August 2016 16:41:32 UTC+1, Mike Drogalis wrote:
Hi David,

Do you have any questions in particular? I assume you found these pages in the user Guide:
- Deployment
- Scheduling

Are you asking how, as a user, do you scale Onyx? You can add more peers at runtime, and work will transparently be dispersed among all available peers. If you're asking how this works under the hood, there's the Architecture chapter, and I did a talk in the spring that focused on the design.

On Mon, Aug 22, 2016 at 2:36 AM, David Collie <dmco...@gmail.com> wrote:
Hi

After reading the docs it's not clear to me what the deployment and scaling strategy is for onyx. Is there some guidance I can read to understand this? I'm looking at onyx as an option for processing ~20TB of data but need to understand how to scale it.

Thanks

Dave

--
You received this message because you are subscribed to the Google Groups "Onyx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+...@googlegroups.com.
To post to this group, send email to onyx...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/onyx-user/e0ca9492-a30b-4516-809c-8e3441a22815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Onyx" group.

To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+unsubscribe@googlegroups.com.

To post to this group, send email to onyx...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/onyx-user/5ef1f76a-c702-4162-b1bc-0ebb2e25e388%40googlegroups.com.

Reply all

Reply to author

Forward