ScalaJulia (Scala + Julia) for Scientific computing and Machine Learning on Spark

554 views
Skip to first unread message

ssarkaray...@gmail.com

unread,
Oct 27, 2015, 7:26:56 PM10/27/15
to scala-user

Is it worth developing a new language ScalaJulia where syntax and semantics of both functional languages Scala and Julia can be integrated into one
functional language ?  This new language should compile and execute all programs written in Scala and Julia separately. It should also compile and
execute all programs written in mixed syntax. Scala classes should be able to access Julia functions through Native Interfaces.

Will this create an easier path towards scalable machine learning platform with Julia using Spark, HDFS and other things ?

Thanks,
SS

Rex Kerr

unread,
Oct 27, 2015, 7:51:09 PM10/27/15
to ssarkaray...@gmail.com, scala-user
No, this isn't necessary.  There are Python bindings for Spark, for example, which work reasonably well.  The same could be done for Julia.  Creating a whole new language is a huge undertaking.  It's much easier to just write more bindings.

That said, having some work in Julia/Scala interop would be nice.  JVM-non-JVM interop is usually rather hairy, so anything that can make it less agonizing (e.g. Rjava) is welcome to those who use both tools.

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oliver Ruebenacker

unread,
Oct 28, 2015, 9:12:13 AM10/28/15
to Rex Kerr, ssarkaray...@gmail.com, scala-user

     Hello,

  "Mixed syntax" sounds tough. Presumably, some code fragments are both valid Scala and Julia code, but they have a different meaning in either. So, you would have to mark which code is Scala and which is Julia. The easiest is probably to just put them into separate files suffixed ".scala" or ".jl", .i.e. no mixed syntax.

  When data flows from one language to another and the type systems are different, type information is lost. This means that either returned types will be too generic (e.g. AnyRef, or Seq[AnyRef]) or you specifically request a type (e.g. getInt), but then there is no compile time check. This typically means that you want to minimize the use of the interface between the two languages.

     Best, Oliver
--
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal, Broad Institute

ssarkaray...@gmail.com

unread,
Oct 28, 2015, 1:01:46 PM10/28/15
to scala-user, ich...@gmail.com, ssarkaray...@gmail.com
Hello,

I made following suggestion in Julia-user group:

---------------------------------------------------------------------------------------
"How is the idea that any function defined with Julia tag gets into a first class object with name Julia in a combined ScalaJulia language :

Julia function sphere_vol(r)
    # julia allows Unicode names (in UTF-8 encoding)
    # so either "pi" or the symbol π can be used
    return 4/3*pi*r^3
end
is translated into ScalaJulia language as: 

object Julia {

    def sphere_vol(r : Int) {

      return ( 4/3*pi*r^3 )
} }


Object Julia will get all first level functions defined in Julia syntax.  This is the simplest way to encapsulate
Julia functions and inherit in other Scala objects or classes in ScalaJulia language.   

This will preserve simplicity and performance of Julia functions."

-------------------------------------------------------------------------------------------------------------------------------

I got following comment from one of the core developers of Julia language:

--------------------------------------------------------------------------------------------------------------------------------
"Excellent idea. You should make a PR against the Scala GitHub repo and see how it goes."
------------------------------------------------------------------------------------------------------------------------------

I am now curious about comments and suggestions from Scala users and developers.  
My plan is to use Scala to access Julia functions using Native Interfaces (like JNA or JNI).

Thanks,
SS

Simon Ochsenreither

unread,
Oct 28, 2015, 1:15:04 PM10/28/15
to scala-user, ich...@gmail.com, ssarkaray...@gmail.com
I don't want to sound discouraging, but I think you underestimate the complexity of this endeavor.
Things could work reasonably well in simple cases, but things will fall apart quickly in more complex ones.

Have a look at how much pain it is for Scala to provide decent interop with Java in the compiler, despite that:

- Scala and Java can't be mixed in the same source file
- Scala and Java run on the same runtime
- Many things in Scala are a fix/generalization of Java "features"

You are suggesting all of the above.

I think things like these have been quite popular in the beginning of programming languages, and have been abandoned in the last 15/20 years (search for ESQL), because it's incredibly hard to even get a single decent programming language to work in isolation.

Simon Ochsenreither

unread,
Oct 28, 2015, 1:21:21 PM10/28/15
to scala-user, ich...@gmail.com, ssarkaray...@gmail.com
Making two languages work together like this, feels like multiple decades away from programming language design currently.

Additionally, you'll have to use JNI, whose key design goal was to be completely terrible to encourage people to write more "pure Java" code.
I think JNR will make many things better (like the ability to bind things dynamically, with code, at runtime (afaik)) instead of having to generate JNI code and compile it ahead of time.

ssarkaray...@gmail.com

unread,
Oct 28, 2015, 3:16:01 PM10/28/15
to scala-user, ich...@gmail.com, ssarkaray...@gmail.com
I need to talk about the real motivation for ScalaJulia language. There are many mathematical functions in Julia 
library that are much faster and can be integrated in machine learning techniques. Users can write new functions in Julia
and it is possible to generate wrappers.

Please see our discussion topics in an upcoming meetup:

We like to have all good features, simplicity and performance in Julia (for Calculus, Statistics, Linear Algebra and other functions) and also
we like to have parallel, memory based big data computations in Spark (written in Scala)..

ScalaJulia will be a first step in that direction.

Thanks,
SS

Rex Kerr

unread,
Oct 29, 2015, 10:05:44 AM10/29/15
to ssarkaray...@gmail.com, scala-user
On Wed, Oct 28, 2015 at 12:16 PM, <ssarkaray...@gmail.com> wrote:
I need to talk about the real motivation for ScalaJulia language. There are many mathematical functions in Julia 
library that are much faster

Much faster than _what_?  It's not terribly hard to write Julia that is slower than Scala--just be a little careless with type stability (or be careful, but do it in a way that the compiler doesn't understand is safe).
 
and can be integrated in machine learning techniques. Users can write new functions in Julia
and it is possible to generate wrappers.

Users can also write new functions in Scala.
 

Please see our discussion topics in an upcoming meetup:

We like to have all good features, simplicity and performance in Julia (for Calculus, Statistics, Linear Algebra and other functions) and also
we like to have parallel, memory based big data computations in Spark (written in Scala).

You haven't made a good enough case that it is worth the frankly enormous time and effort of creating a new language.  Don't fool yourself into thinking that "ScalaJulia" isn't a new language.  You wrote, for your toy example,
  4/3*pi*r^3
Well, in Scala 4 and 3 are integers, so 4/3 is 1.  ^ is not defined on Double so you could add it, but ^ is already defined on integers as bitwise xor (like C), so 2^7 is not 128 but 5.  That'd be confusing!  So you can't even write simple arithmetic in this hybrid language and be able to use existing code.

It's a new language with its own syntax.

You haven't even said whether this language is going to run on the JVM or not.  If it's going to run on the JVM, come up with a Julia-JVM compatibility layer first before you think about trying to squash two languages together!  Then, once you have a good interop story, think about whether it's worth all the extra complexity of a new language.

If it's not supposed to run on the JVM, you're not going to be able to compile Spark for "ScalaJulia" anyway.  So "just" port Spark to Julia and see if the expected performance benefits materialize.

  --Rex

ssarkaray...@gmail.com

unread,
Oct 29, 2015, 2:54:19 PM10/29/15
to scala-user, ssarkaray...@gmail.com
Please see my comments below.  Thanks for the questions. Let us discuss more.


On Thursday, October 29, 2015 at 7:05:44 AM UTC-7, Rex Kerr wrote:


On Wed, Oct 28, 2015 at 12:16 PM, <ssarkaray...@gmail.com> wrote:
I need to talk about the real motivation for ScalaJulia language. There are many mathematical functions in Julia 
library that are much faster

Much faster than _what_?  It's not terribly hard to write Julia that is slower than Scala--just be a little careless with type stability (or be careful, but do it in a way that the compiler doesn't understand is safe).

Please see the links for some interesting performance comparisons:



If you are not careful about optimization, Scala performance can be really bad.
Julia has some latest compiler optimization techniques:  http://julialang.org/

 
and can be integrated in machine learning techniques. Users can write new functions in Julia
and it is possible to generate wrappers.

Users can also write new functions in Scala.
 
Julia has a large set of library functions for scientific computing. Julia replaced MATLAB in scientific computing.
I saw several articles from scientists.
 

Please see our discussion topics in an upcoming meetup:

We like to have all good features, simplicity and performance in Julia (for Calculus, Statistics, Linear Algebra and other functions) and also
we like to have parallel, memory based big data computations in Spark (written in Scala).

You haven't made a good enough case that it is worth the frankly enormous time and effort of creating a new language.  Don't fool yourself into thinking that "ScalaJulia" isn't a new language.  You wrote, for your toy example,
  4/3*pi*r^3
Well, in Scala 4 and 3 are integers, so 4/3 is 1.  ^ is not defined on Double so you could add it, but ^ is already defined on integers as bitwise xor (like C), so 2^7 is not 128 but 5.  That'd be confusing!  So you can't even write simple arithmetic in this hybrid language and be able to use existing code.


This was just an example. In Scala, traits can be defined with implementations in Julia. 


It's a new language with its own syntax.

You haven't even said whether this language is going to run on the JVM or not.  If it's going to run on the JVM, come up with a Julia-JVM compatibility layer first before you think about trying to squash two languages together!  Then, once you have a good interop story, think about whether it's worth all the extra complexity of a new language.


This is where we need to brainstorm. 
C (and Java) programs can embed Julia through APIs :  http://docs.julialang.org/en/release-0.4/manual/embedding/
 
If it's not supposed to run on the JVM, you're not going to be able to compile Spark for "ScalaJulia" anyway.  So "just" port Spark to Julia and see if the expected performance benefits materialize.

  --Rex


Spark, Spark streaming, Mlib and other evolving big data ecosystem can utilize scientific and mathematical strength of Julia.



Thanks,
SS
 

Rex Kerr

unread,
Oct 29, 2015, 3:23:52 PM10/29/15
to ssarkaray...@gmail.com, scala-user
On Thu, Oct 29, 2015 at 11:54 AM, <ssarkaray...@gmail.com> wrote:
Please see the links for some interesting performance comparisons:



Yes, I'm aware of those.  I keep hoping someone will publish the full Computer Languages Benchmark Game tests.  Those benchmarks test a really narrow set of capabilities.
 

Er, this article says that Java performed a benchmark-specific micro-optimization on the Java code but not the Scala code, and that the really fast Java time wasn't reflective of anything useful.
 
If you are not careful about optimization, Scala performance can be really bad.

Indeed.  Just like Julia.  Careless use of either will be slow.  (Heck, careless use of C is slow.  You always have to be somewhat careful.)
 
Julia has some latest compiler optimization techniques:  http://julialang.org/

 
and can be integrated in machine learning techniques. Users can write new functions in Julia
and it is possible to generate wrappers.

Users can also write new functions in Scala.
 
Julia has a large set of library functions for scientific computing.

Indeed, but it's easier to rewrite those for Scala than it is to create a new language.
 
Julia replaced MATLAB in scientific computing.

Ehm, what?  I can think of at least one sizable company with sales numbers that would indicate otherwise.
 
I saw several articles from scientists.

Some people use Julia, yes.  Including me.  It's very promising!  But let's be realistic about penetration etc..
 
 

You haven't made a good enough case that it is worth the frankly enormous time and effort of creating a new language.  Don't fool yourself into thinking that "ScalaJulia" isn't a new language.  You wrote, for your toy example,
  4/3*pi*r^3
Well, in Scala 4 and 3 are integers, so 4/3 is 1.  ^ is not defined on Double so you could add it, but ^ is already defined on integers as bitwise xor (like C), so 2^7 is not 128 but 5.  That'd be confusing!  So you can't even write simple arithmetic in this hybrid language and be able to use existing code.


This was just an example. In Scala, traits can be defined with implementations in Julia.

You didn't address my point, which is that you have invented a new language which is not Scala and probably isn't Julia either.
 
 


It's a new language with its own syntax.

You haven't even said whether this language is going to run on the JVM or not.  If it's going to run on the JVM, come up with a Julia-JVM compatibility layer first before you think about trying to squash two languages together!  Then, once you have a good interop story, think about whether it's worth all the extra complexity of a new language.


This is where we need to brainstorm. 
C (and Java) programs can embed Julia through APIs :  http://docs.julialang.org/en/release-0.4/manual/embedding/

Brainstorm what?  You've got docs for how to do it in C, and you have docs for how to use JNI, so why not give it a go and find out what is easy or hard about it?

It's waaaaaay easier to write a tool that does that automatically than it is to write a new language.

  --Rex

Haoyi Li

unread,
Oct 29, 2015, 3:52:57 PM10/29/15
to Rex Kerr, ssarkaray...@gmail.com, scala-user
Heck, careless use of C is slow.  You always have to be somewhat careful.

Yeah! If you're not careful you'll end up writing a Python interpreter and then you'll know what poor performance means =P

Rex Kerr

unread,
Oct 29, 2015, 3:59:01 PM10/29/15
to Haoyi Li, ssarkaray...@gmail.com, scala-user
Heh, well, I was specifically thinking of careless use of malloc/free being waaaaay slower than GC for piles of tiny objects.

But, yeah, I hate it when I accidentally write a Python interpreter and things then take forever to run.

  --Rex

Reply all
Reply to author
Forward
0 new messages