Elixir distributed stream analytics like Apache Spark

1,195 views
Skip to first unread message

Jacques du Preez

unread,
Aug 21, 2015, 12:30:22 PM8/21/15
to elixir-lang-talk
Hi,

I've just started learning and using Elixir for a private project. My work is currently investigating using Scala and Apache Spark. Personally I find Scala a very cumbersome language, and I'd like to see if we can instead use Elixir. I was wondering is there anything similar to Spark for Elixir? Or maybe I can just use plain Elixir processes and something like Redis to achieve the same result.

What do you guys think?

Thanks

Paulo Almeida

unread,
Aug 22, 2015, 6:43:06 AM8/22/15
to elixir-lang-talk
Hi,

It depends. If you're considering Spark, I'm sure your use case will involve TB or PB or data and tens or hundreds of nodes to crunch data. Of course if this is not the case you can just use a simpler, custom solution. Maybe you don't even need Redis as external dependency, since you can just use ets to store computations and the native inter-node communication features of Erlang. But you still have to solve problems like: how to partition data and distribute the work across your cluster? What will be the recovery strategy in case of failure? Can you afford to partially lose data?

My opinion is that if Spark meets all the requirements, and the only downside is that you have to describe your computations in Scala, maybe you can make a thin layer in Elixir, and use JInterface (http://www.erlang.org/doc/apps/jinterface/jinterface_users_guide.html) to interop with the JVM. At first glance ruby-spark (http://ondra-m.github.io/ruby-spark/) seems to use something along these lines. Or just use the python API that comes out-of-the-box with Spark.

Jacques du Preez

unread,
Aug 23, 2015, 7:15:56 AM8/23/15
to elixir-l...@googlegroups.com
Hi,

Thank you for the excellent suggestions. I appreciate it.

how to partition data and distribute the work across your cluster?

Yes, it's these type of things that I will have to solve myself, if I don't leverage a framework like Spark. And in the process, I will be reinventing the wheel. But yes, definitely something to consider, as our use case isn't that complex, and Spark might end up being overkill.

maybe you can make a thin layer in Elixir, and use JInterface (http://www.erlang.org/doc/apps/jinterface/jinterface_users_guide.html) to interop with the JVM.

I really can't comment on this, since I have no knowledge of JInterface and how well it works and how easy it is to work with. But, just on gut feel, based on experience from other languages with similar tools, I have a feeling this might a pain to do. But something to investigate.

python API that comes out-of-the-box with Spark

Actually we are considering Clojure too. It seems it has a nice Spark library, and the language itself integrates very well with anything from JVM.

Thank you,
==============================
Jacques du Preez

Web: medium.com/@jdp
Twitter: @jacquesdp

--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-talk/LavC1ccAbIA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-ta...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/e8ef4331-c720-476e-938a-35ef1a880784%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages