I am interested in writing a small yarn app that would deploy Pinot into Yarn. This would make it easier for my group to deploy pinot into clustered environments with a single command. I wanted to run this by folks here to gauge the level of interest of having this be a potential contribution.
As a starting point, this is what I envision-
1. a yarn deploy class, and possibly a script wrapper (pinot-yarn-deploy.sh)
PinotYarnDeploy (and a few additional ancillary classes) would be created and placed inside com.linkedin.pinot.hadoop.yarn from the pinot-hadoop module (or possibly as its own module if that would make more sense: pinot-yarn?)
PinotYarnDeploy would leverage args4j to accept yarn and pinot properties such as numControllers, numBrokers, numServers, and settings for each container type (cpus, memory, container overhead, etc..).
2. a basic yarn application master with a web ui that would:
a. show where things are running
b. allow one to add/remove: controllers, brokers, servers from the yarn app master ui
Running in yarn all the storage modes should be able to be supported.
* hdfs backed- containers would copy segments out of hdfs, which is extra copying and storage but should be fine
* nfs backed (we'd probably use the hdfs nfs gateway as that allows storage in hdfs, supports mmapping files directly out of hdfs).
Beyond that I would be interested in adding the following as well:
* allow services to bind to random ports if passed 0 as the port number
* pinto-admin.sh commands that take -controllerHost -controllerPort would optionally be able to take -zkAddress, which would choose a controller at random, and then connect to it using the published host and port that sit in zookeeper (a.k.a- this just makes it easy to work with random ports)
Any input is appreciated and welcome, thank you!