Apache Spark and Lift?

105 views
Skip to first unread message

John Andrews

unread,
Jan 11, 2017, 10:47:19 AM1/11/17
to Lift
So I've been developing on a Java/Scala/Lift application where I use Java for calculations (due to legacy code, libraries), and Scala/Lift to bring them to the browser. I've already started working with Spark using a mix of Scala/Java to work with my existing Java code.

However I'm running the code as a Spark job on a test cluster, which does the calculations, but no longer has the Lift and web client once I package the fat jar and deploy the Spark job. Obviously what I would want is to have the Spark job run based on actions taken from the Lift application (like a user hitting a 'run calculation' button)

I've looked into Spark Launcher; however that only seems to work in processing the jar of a program that you've uploaded but it's not connected to Java/Scala/Lift like my regular application is. I could do something like have data stored on I/O and then have my Lift application read from there, but that's clunky and doesn't make sense if I'm running smaller jobs instead of one large job.

I've also seen this example of someone doing this with the Play framework: https://blog.knoldus.com/2014/06/18/play-with-spark-building-apache-spark-with-play-framework/

Is there a known method and way to approach this, or at least some guidance? Thanks a lot, and if not I'll be sure to reach out here with the way it's solved.

Diego Medina

unread,
Jan 11, 2017, 11:18:44 AM1/11/17
to Lift
At work we don't use Spark, but have the concept of a jobs server that does heavy calculations on demand and then we have our web app. The way we have them work together is we use RabbitMQ to send a message from any of our 4 web servers to the jobs server, and then the jobs server runs the task written in Scala, or it runs a tool we wrote in Go, both of them save the results to our database and then Lift reads the results from there.

In our case, we want to keep the results in a persistent store, vs just having results in memory in the jobs server and have Lift display them just once.

Some of our jobs are really fast, just a few seconds, and others can take 8+ hours to process the amount of data we have and this setup has worked well.

Not sure if in your case your calculations are still useful after an initial render of Lift.

In short, I think having Spark store the results in your database isn't such a bad idea, unless I'm missing critical details about your use case.

Thanks

Diego




--
--
Lift, the simply functional web framework: http://liftweb.net
Code: http://github.com/lift
Discussion: http://groups.google.com/group/liftweb
Stuck? Help us help you: https://www.assembla.com/wiki/show/liftweb/Posting_example_code

---
You received this message because you are subscribed to the Google Groups "Lift" group.
To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Diego Medina
Lift/Scala Consultant
di...@fmpwizard.com
https://blog.fmpwizard.com/

John Andrews

unread,
Jan 11, 2017, 1:15:10 PM1/11/17
to Lift
So in short we have a calculation model that has say 100 ticks. We load up the page with Lift, use Ajax for inputs made on the page to change the model's parameters and to move the model forward 1-100 ticks as desired. We then get the data run from the model and use JSON Rest to output the model data in full or summary as needed to the browser. Works great.

We use Spark to distribute our model for use cases that need it (for example copies of the model with varied inputs for Monte Carlo, or if a model has calculations that could be parallelized with minimal communication needed between nodes)

The issue is that the Spark 'job' in this case isn't some self-contained calculation, it comes as a result from inputs from the Lift app. Also while since we are not running Start>>End it's too much overhead of i/o to have to store and load the entirety of the models data for just moving forward 1 tick. (Which is what we would have to do using Spark Launcher to launch a singular job) When really we want to have the Spark job running when the user starts the model, run calcs when prompted in the browser, report back info to the browser, and close only when the user is done.

Thanks for help and response.
To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Antonio Salazar Cardozo

unread,
Jan 11, 2017, 3:37:59 PM1/11/17
to Lift
I have basically zero Spark experience, but I would guess that the Spark launcher
just sets up some environment and invokes a bootstrapping method in Spark that
gets it communicating with whatever will execute the actual Spark jobs. I think for
what you're trying to do, it may make sense to try and do that bootstrapping inside
Lift's boot class, giving you an environment that should then be able to communicate
with Spark.

Again, I'm saying this with zeeeero knowledge of Spark, so take it with a grain of
salt. If that doesn't point you in a direction that works, I might be able to make some
time to peek into things a little more deeply.
Thanks,
Antonio

Ethan Jewett

unread,
Jan 12, 2017, 4:25:06 PM1/12/17
to lif...@googlegroups.com
I don’t completely follow the question. I’m slightly out of date (the concept of a Spark Context has been deprecated to some extent), but I think the basic idea is that you would want to instantiate a Spark Context in your Lift application. You may want to do this in the context of a user session or when you start the Lift application, depending on how you want to manage context sharing. The Spark Context can point at a Spark cluster that is anywhere, and the cluster will load and process data as directed by your job definition (which is defined in the Lift application).

This is an example of spinning up a global context pointed at a local Spark instance, but it could just as well point at a cluster: https://github.com/esjewett/lcadata/blob/master/src/main/scala/org/coredatra/bigfilter/model/LocalSparkContext.scala

However, I suspect that I’m missing something fundamental about what you are proposing. Can you share an example of the type of job you are trying to run or the specific difficulty you are encountering? I suspect that this is more of a Spark question than a Lift question, but that may clarify the issue.

Ethan

To unsubscribe from this group and stop receiving emails from it, send an email to liftweb+unsubscribe@googlegroups.com.

John Andrews

unread,
Feb 8, 2017, 8:14:02 AM2/8/17
to Lift
Reply all
Reply to author
Forward
0 new messages