Vertx on Hadoop YARN

Oliver Rolle

unread,

Dec 14, 2014, 1:04:18 PM12/14/14

to vertx-...@googlegroups.com

Hi community,

on Friday I attended with Todor an hackathon. We build a pilot which is able to execute vertx platform on Hadoop YARN. Its available on https://github.com/orolle/vertx-on-yarn. See README to test the pilot yourself. You need > 4 GB for the VM to be able to execute the pilot.

Hadoop YARN is like a cluster OS which negotiates computing resources to applications. vertx-on-yarn allocates resources to containers on a YARN cluster, and launches a vertx platform instances in the container, by using twill (twill simplifies the YARN API, which is a mess). Within the container a single vertx platform instances is started on which you can deploy modules and verticles.

In the pilot the platform deploys crashub SSH which you can log in and interact with vertx. It also deploys a java script verticle which proofs that we are able to use any supported language to write applications which can be executed in a distributed way on any YARN cluster. This makes scaling easy because YARN is the quasi standard for Big Data clusters while you can use your favorite language to build the scalable application. This combination is a very unique feature and I have not seen a similar functionality in any Big Data project or within the web development community! Last but not least I see vertigo in this context as an abstraction of the event bus, making it easy build a scalable communication between module instances.

The business process stuff I am working on is there to help building scalable business processes on top of vertigo - but this needs more research as I started this project in a naive way, the presentation-layer (aka. web service) is problematic to connect with the vertigo network which exectues the the business process.

A very important decision we have to make, is how we provide web service functionality with vertigo (maybe we use yoke module or other 3rd party? maybe we provide something ourself?). If we have that we have a usable scalable web framework to develop applications on Big Data clusters. On the other hand if the platform or something similar would be available in vertx3 I would work on UI and module registry as this allows to publish functionality similar to npm for node.js - damn it, I am indecisive!

Before I publish vertx-on-yarn on vertx mailing list I like to have feed back from you!

best regards
Oli

Jordan Halterman

unread,

Dec 15, 2014, 3:41:41 AM12/15/14

to vertx-...@googlegroups.com

Wow that is amazing!

I saw someone else made a ZooKeeper cluster manager for Vert.x 3 (I hate ZK). That made me interested in writing a Mesos cluster manager. I have never used YARN, but I know it's a great next-gen technology, and this looks really interesting! I'll have to play with it when I get more time.

Sent from my iPhone

--
You received this message because you are subscribed to the Google Groups "vertigo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx-vertig...@googlegroups.com.
To post to this group, send email to vertx-...@googlegroups.com.
Visit this group at http://groups.google.com/group/vertx-vertigo.
To view this discussion on the web visit https://groups.google.com/d/msgid/vertx-vertigo/628d86f5-ace1-45a3-9947-84036a4fffc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oliver Rolle

unread,

Dec 15, 2014, 8:31:26 AM12/15/14

to vertx-...@googlegroups.com

Mesos and YARN are both cluster resource negotiators. Atm Mesos is the better technology imo (e. g. Docker support, nicer API, native C++ / faster), but YARN has more backing and is coming up with similar features in future (e. g. Docker).

My first observation is that actors in the big data technology market are adopting YARN and only a few adopt Mesos. 2nd assumption is that on 1 cluster there will be only one resource negotiator (operating cost are lower because staff needs to know and manage only 1 technology). This leads to the conclusion that in the end only one cluster resource negotiator will be installed in one big data cluster and the probability that YARN will be the choice is higher than Mesos will be installed.

The YARN API is crap! ZK is crap! We stumbled over twill which simplified the development a lot in that respect. Implementing the YARN API directly are at least about 2000 lines of code (Hello world example has 360 LOC), the pilot has 200 LOC including scaling instances and service port discovery! The only drawback with twill is that we cannot negotiate resources more precisely, as multiple vertx instances are deployed on the same physical machine, which is not optimal for vertx execution as event loops compete for cpu cores.

Oliver Rolle

unread,

Dec 15, 2014, 1:32:43 PM12/15/14

to vertx-...@googlegroups.com

Btw we preferred twill / YARN to Mesos because (1) Cloudera publishes a VM that provides everything we needed (Hadoop YARN ready to use + Eclipse) and (2) twill API is simpler than mesos.

Reply all

Reply to author

Forward