I feel that on the syntax level, Julia sacrificed too much elegancy trying to be compatible with textbook math notations

262 views
Skip to first unread message

Siyi Deng

unread,
May 24, 2016, 5:23:00 PM5/24/16
to julia-users

numpy arrays operate element-wise by default, and have broadcasting (binary singleton expansion) behaviors by default. in julia you have to do (.> , .<, .==) all the time.

Isaiah Norton

unread,
May 24, 2016, 5:31:29 PM5/24/16
to julia...@googlegroups.com
This has been a point of (extremely) lengthy discussion. See the following issues for recent improvements and plans, some of which will be in the next release:

https://github.com/JuliaLang/julia/issues/16285

Gabriel Gellner

unread,
May 24, 2016, 5:57:58 PM5/24/16
to julia-users
I don't think those discussion points will make Siyi happy ;) We are getting even more required use of dotted "broadcasting" instead of less like he wants.
I much much prefer being explicit with the dots, but like all syntax discussion it seems to get heated :)

Steven G. Johnson

unread,
May 24, 2016, 7:01:23 PM5/24/16
to julia-users
It's not just a question of syntactic convenience or brevity.  There is a huge semantic benefit to using dots for broadcasting operations: they explicitly inform the compiler, at the syntax level, that a broadcast is intended.   We aren't exploiting it yet, but soon I'm hoping to have automated loop fusion for dotted broadcasts.

In contrast, if you write "x ^ y", the compiler can't detect that you want a broadcast until quite late, if ever.  First, you have to know that x and y are vectors, which doesn't happen until compile-time after type-inference occurs (assuming the types are inferred correctly).   Second, ^(x::Vector, y::Vector) can literally be anything — the whole point of Julia's design is that functions like ^ are just ordinary functions implemented in Julia itself, with no special status in the compiler.   So, at compile time, once you figure out the types of x and y and know what ^ method to dispatch to, the compiler has to look "inside" the ^ function and figure out somehow that it is a broadcast.   The problem with that kind of compiler "magic" is that it tends to be quite brittle, and is easily confused by functions that aren't written in a very special style.

Whereas if you write sin.(x).^y, the compiler knows at parse time that broadcast is intended, and can transform it at parse time to broadcast((x,y) -> sin(x)^y, x, y), i.e. a single fused loop, without knowing anything about the types of x and y or the functions sin and ^.   Again, this doesn't happen yet, but I'm hoping it will happen soon.  (And since this becomes a parse-time guarantee, rather than a compiler optimization that may or may not occur, you don't have to worry about purity and side effects: an expression like sin.(x).^y will be defined as the fused loop, so we don't have to check whether it is equivalent to the unfused loop.)

(The key feature that has opened up this possibility is that anonymous functions are now fast, so that broadcast(+, x, y) is now as fast as the specialized loop for .+ that we have now.)

Steven G. Johnson

unread,
May 24, 2016, 7:10:02 PM5/24/16
to julia-users
(There is also the original benefit discussed in #8450: when you write a function f(x::Number), you don't have to decide in advance whether you need a separate "vector" version of it too.  You can just write f.(x) for any scalar function f.)

Note that all of this is not really an option in Python or NumPy, because Python loops are too slow — for performance, they must decide in advance which loops they want to vectorize, and implement those in C/Cython/etc.   There are also various code-generation/JIT frameworks for NumPy that can fuse NumPy loops, but again they can only fuse a set of NumPy operations that are selected in advance, not arbitrary user-defined functions.    In consequence, in Python there is not as much practical benefit to be gained from a syntactic broadcast.

Siyi Deng

unread,
May 24, 2016, 7:32:59 PM5/24/16
to julia-users
I tend to agree that explicit broadcasting is better than numpy's default behavior.

However, IMO operations on the n-d arrays is better defaulted to element-wise, and n-d with scalar default to element-wise too.

Think about it, a lot of operations are not even commonly defined for 3-d and above, so why wasting the most straightforward (a+b, a*b, a/b) syntaxes on special-case matrix operations?  

In numerical computing, matrix math is only like 5% of the times and the other 95% are element-wise or tensor-like.

Davide Lasagna

unread,
May 24, 2016, 7:53:50 PM5/24/16
to julia-users
Thanks Steven for the clear explanation!

Chris Rackauckas

unread,
May 25, 2016, 12:19:15 AM5/25/16
to julia-users
While I don't think it's true that numerical computing is only 5% matrix math, most user facing code isn't matrix math.

At its core, just about every numerical algorithm is matrix math. Every nonlinearity or connection between terms becomes a matrix, and so every equation is either solving Ax or A\b. Even solving nonlinear equations becomes function calls to build matrices which we then solve with Ax or A\b (think of Newton/Broyden methods). Then, even A\b is about solving Ax in every iterative solver. Higher dimensional problems use tensor operators which normally generalize matrix multiplication, and so you can usually more compactly write them as matrix multiplication along different dimensions (and solve it efficiently via BLAS). So at its core, building all of these numerical libraries is all about repeatedly solving Ax, and so it may be the most used operation (along with +b).

However, outside of methods development, normally you aren't using matrix functions and are rather defining things element-wise. Thus all of the .'ing has been been passed off to the user, and so you have to define a nasty things like f(x,y,z)= x.^(y.*z).*x. However, I think this change will actually eliminate this issue in many cases, because a user will just pass a function f defined on scalars, and then within the algorithm will essentially just use f., and with the ease of this kind of broadcasting, I think many algorithms will "upgrade" to not require vectorized inputs anymore.

This syntax is a very Julia idea, but the main issue will be learning to use this feature. This is very Julia because it is simple but explicit syntax. Julia thrives on the fact that it adds slight bits of explicitness to scripting languages which it uses for compiler optimization (this is essentially what multiple dispatch is doing). Developers exploit this explicitness by being able to parse through code and read it almost as psudocode, and re-write your code to do the operations you want, but in a much more optimized way (see ParallelAccelerator.jl and its current limitations as an example of something that can really use this explicitness of vectorization in user defined functions). 

But with all of the optimization goodness this can provide, it will likely turn away users who will expect sin([4;5;6]) to work without adding the dot the first time they use the function. More effort does need to be put into new user issues because in my opinion there is a barrier new users have to climb before they see all of the advantages, and this is just the newest iteration of the problem.
Reply all
Reply to author
Forward
0 new messages