Integration of Vert.x and Apache Spark Possibilities

713 views
Skip to first unread message

kameshwar mavuri

unread,
Sep 2, 2016, 1:52:43 AM9/2/16
to vert.x
Hi,

We have a project where we have integrated the Vert.x and Apache Spark MLlib. We used Vert.x for listening and sending the data to and from Ember. And Apache Spark MLlib is used for predicting the Data.

Problem we are facing:
when a request comes from Ember to Vertx, Vertx's verticle will form the data and send it to the Apache Spark cluster. Since Vertx is asynchronous, verticle is sending the null data before the Apache Spark Prediction complete. 

Is there a way to handle the above scenario where the verticle should act as a syncrounous verticle.

Alexander Lehmann

unread,
Sep 2, 2016, 8:05:42 AM9/2/16
to vert.x
My Spark experience is a bit small, but I assume you start a job on the cluster by running a command (spark-submit) or connecting to a socket to do the same , this will run for some time and report the result back by creating a file or opening a socket again.

Since you are usually not waiting for anything in vert.x due to the async operation, it would be necessary to write handler that sends the result event to vert.x, depending on how the result is received.

In the vert.x http handler, you would just do something like (assuming you have a spark object that handles the submission and the result handler.

spark.submitjob(job, r -> { result code ...});

and write the result to the response in the resultHandler.

Alexander Lehmann

unread,
Sep 2, 2016, 8:23:45 AM9/2/16
to vert.x
Actually, its probably easiest to look at the code that is used in a synchronous app to access the system and modify that to async operations.

Clement Escoffier

unread,
Sep 3, 2016, 2:14:19 AM9/3/16
to ve...@googlegroups.com
Hi,

We will need some code to understand it.
It’s clear that you need to wait until the prediction has been received before writing the data back. Do you have a way to be notified when this operation has been completed ?

Clement

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/d76f2acf-9ff8-44ae-9e2b-dac0e367478e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jochen Mader

unread,
Sep 3, 2016, 2:14:33 AM9/3/16
to ve...@googlegroups.com
It's hard to tell from your description what you are trying to achieve.

You can submit a Spark-job directly from Vert.x and wait for a result. For that you have to use executeBlocking. Use a WorkerExecutor as these can block for quite some time (minutes, hours, days, depending on what your job is doing) to prevent these jobs from impacting Vert.x itself:

(All the following is pseudo-code I just scrabbled together out of my head ...)
Let's take a small Spark-job:

class SparkJob() extends Serializable{
  def getCount() :Long = {
  val conf = new SparkConf()
    .setAppName("wordCounter")
    .setMaster("spark://127.0.0.1:7077")
  val sc = new SparkContext(conf)
  val file = sc.textFile(filePath)
  file.filter(_.contains("wtf")).count()
 }
}

Using this job inside vert.x could look like this:

vertx.createSharedWorkerExecutor("sparkWorker").executeBlocking(h => {
val job = new SparkJob
return job.getCount()      
}, res => {<useresult>})

Did that help?


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Jochen Mader | Lead IT Consultant

codecentric AG | Elsenheimerstr. 55a | 80687 München | Deutschland
tel: +49 89 215486633 | fax: +49 89 215486699 | mobil: +49 152 51862390
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de

Sitz der Gesellschaft: Düsseldorf | HRB 63043 | Amtsgericht Düsseldorf
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
Reply all
Reply to author
Forward
0 new messages