Deploying Pinot

smpro...@gmail.com

unread,

Apr 24, 2018, 11:39:27 AM4/24/18

to Pinot Users

Hi-

I have a few random questions that I thought this would be the best place to ask-

Is there any documentation on deploying Pinot onto a cluster?

Are there typical tuning settings I should be aware of (recommend heap sizes, java flags like -XX:MaxDirectMemorySize, etc)?

What is the typical server form that one would run pinto in terms of memory and cores?

It's probably highly variable, but ballpark how many servers is typical in an pinot cluster (small, med, large)?

It looks like the recommend way to run Pinto is to have storage backed by NFS. Hadoop has NFS gateway support, is this the recommend way? If not what NSF setup is typical?

Segment creation - index service: is this something that is still in the works or just on the roadmap at this point?

Would there be any interest in running Pinot on a cluster manager like yarn to make deploying easier?

Thank you!

kishore g

unread,

Apr 24, 2018, 12:25:48 PM4/24/18

to smpro...@gmail.com, Pinot Users

We definitely need more documentation on deploying Pinot in production. I will take a quick stab at answering your questions.

- If you have dedicated nodes and not running any other apps on the node, you can set the table load mode to mmap. With this you can typically run historical nodes with 4GB heap. For real-time nodes, we typically run with 16GB memory. Typically the memory needed is proportional to the number of consuming segments, once the segments are committed, they are mmap'ed.

- At LinkedIn, we run on 24 core, 64GB ram machines. We have a mix of SSD and Disk and we pick them up based on latency/throughput/multitenancy etc.

- Even though we use NFS at LinkedIn, you can run it without NFS as well (Uber runs it using webhdfs protocol). There is a proposal to support deep storage.

- Index Service is on the roadmap, no work has started on that.

- Yes, deploying using Yarn, Mesos etc will be super cool. There is a PoC to deploy Pinot on Kubernetes.

--
You received this message because you are subscribed to the Google Groups "Pinot Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pinot_users/cb73cec9-ea4d-4c09-8b01-32ba455a258f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

smpro...@gmail.com

unread,

Apr 25, 2018, 1:43:08 PM4/25/18

to Pinot Users

Kishore-

Thank you this is very helpful!

> Even though we use NFS at LinkedIn, you can run it without NFS as well (Uber runs it using webhdfs protocol).

Are the committed segments coped from nfs/webhdfs to local storage (SSD) and then mmaped? Or in the case of NFS are they just mmaped directly over NFS?

On Tuesday, April 24, 2018 at 9:25:48 AM UTC-7, kishore gopalakrishna wrote:

We definitely need more documentation on deploying Pinot in production. I will take a quick stab at answering your questions.

- If you have dedicated nodes and not running any other apps on the node, you can set the table load mode to mmap. With this you can typically run historical nodes with 4GB heap. For real-time nodes, we typically run with 16GB memory. Typically the memory needed is proportional to the number of consuming segments, once the segments are committed, they are mmap'ed.
- At LinkedIn, we run on 24 core, 64GB ram machines. We have a mix of SSD and Disk and we pick them up based on latency/throughput/multitenancy etc.
- Even though we use NFS at LinkedIn, you can run it without NFS as well (Uber runs it using webhdfs protocol). There is a proposal to support deep storage.
- Index Service is on the roadmap, no work has started on that.
- Yes, deploying using Yarn, Mesos etc will be super cool. There is a PoC to deploy Pinot on Kubernetes.

On Tue, Apr 24, 2018 at 8:39 AM, <smpro...@gmail.com> wrote:

Hi-

I have a few random questions that I thought this would be the best place to ask-

Is there any documentation on deploying Pinot onto a cluster?

Are there typical tuning settings I should be aware of (recommend heap sizes, java flags like -XX:MaxDirectMemorySize, etc)?

What is the typical server form that one would run pinto in terms of memory and cores?

It's probably highly variable, but ballpark how many servers is typical in an pinot cluster (small, med, large)?

It looks like the recommend way to run Pinto is to have storage backed by NFS. Hadoop has NFS gateway support, is this the recommend way? If not what NSF setup is typical?

Segment creation - index service: is this something that is still in the works or just on the roadmap at this point?

Would there be any interest in running Pinot on a cluster manager like yarn to make deploying easier?

Thank you!

--
You received this message because you are subscribed to the Google Groups "Pinot Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users...@googlegroups.com.

kishore g

unread,

Apr 25, 2018, 1:48:07 PM4/25/18

to smpro...@gmail.com, Pinot Users

They are copied over from deep storage (it can be NFS, S3, HDFS) to local storage and mmap'ed. If you are using NFS, it needs to be mounted only on the controllers.

thanks,

Kishore G

To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pinot_users/444d7531-b3b9-41b8-a130-508c8306df18%40googlegroups.com.

reinvi...@gmail.com

unread,

May 11, 2018, 3:48:01 PM5/11/18

to Pinot Users

Ahh so if I'm following correctly- the controllers are the only ones that actually read from the backing store (and then copy segments over to the historical nodes)?

Thanks again

kishore g

unread,

May 11, 2018, 5:20:31 PM5/11/18

to reinvi...@gmail.com, Pinot Users

Not really. We store a URI in the segment metadata. The historical nodes use that URI to download the segment. The URI can point to anything (controller in case of NFS mounted on the controller, S3, web hdfs, etc)

thanks,

Kishore G

To unsubscribe from this group and stop receiving emails from it, send an email to pinot_users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pinot_users/7119cf98-aa47-4571-952f-eb32b65fd538%40googlegroups.com.

Mayank Shrivastava

unread,

May 11, 2018, 5:39:40 PM5/11/18

to reinvi...@gmail.com, Pinot Users

Yes, except controllers don't copy the segments over to historical nodes. Controller notifies the historical nodes that new data is ready for them to download, along with the uri, and the node download the data.

-mayank

From: pinot...@googlegroups.com <pinot...@googlegroups.com> on behalf of reinvi...@gmail.com <reinvi...@gmail.com>
Sent: Friday, May 11, 2018 12:48 PM
To: Pinot Users
Subject: Re: Deploying Pinot

To view this discussion on the web visit https://groups.google.com/d/msgid/pinot_users/7119cf98-aa47-4571-952f-eb32b65fd538%40googlegroups.com.

Reply all

Reply to author

Forward