Array order and parallel computing

Martin Rupp

unread,

Nov 12, 2012, 5:48:24 AM11/12/12

to juli...@googlegroups.com

Hi all, some time ago I wrote a small julia set program in julia (http://mathemartician.blogspot.de/2012/07/julia-set-in-julia.html ). Unfortunately - it does not work anymore. It seems the order in Array(type, dim1, dim2) changed, so i just need to change m = Array(Int64, h, w) to m = Array(Int64, w, h). But the other thing is, the parallel version is not working anymore, it abort with the errors

exception on 2: parJuliaInit not defined

exception on 3: parJuliaInit not defined

exception on 4: parJuliaInit not defined

It used to work some time ago, and as i written on http://docs.julialang.org/en/latest/manual/parallel-computing , that error should be gone when parJuliaInit is load'ed from a file - but the error remains. (checked all with a git clone from today).

Stefan Karpinski

unread,

Nov 12, 2012, 6:02:05 AM11/12/12

to Julia Dev

I do love that example. We should definitely make sure that it works again – and maybe even copy it into our examples directory with your permission (and a link to the blog entry).

--

Tim Holy

unread,

Nov 12, 2012, 7:47:26 AM11/12/12

to juli...@googlegroups.com

On Monday, November 12, 2012 02:48:24 AM Martin Rupp wrote:
> It seems the order in Array(type, dim1, dim2) changed,

Nothing changed in the fundamental representation of arrays or their
constructors; that would be a mind-bogglingly huge change that would break
everything. I'm wondering if the way you were displaying it changed?

Martin Rupp

unread,

Nov 12, 2012, 8:32:32 AM11/12/12

to juli...@googlegroups.com

Hi Tim, hi Stefan,

i looked at the code again and that serial calculation I think was not right. seems like i copied the wrong version in there. didn't seem to put out an error in earlier versions. i fixed the code on my blog now but still can't get the parallel code to be running again. Here's a very small example:

function parInit(T, d, da)

return Array(Int64, d)

end

load("partest.jl")

Dm = darray(parInit, Int64, (10, 10))

print(Dm)

calling that now results in:

[~/julia/julia] julia -p 2 test.jl

10x10 Int64 DArray:

exception on 2: parInit not defined

any ideas?

Would be great if the julia-example would be incorporated into the examples when it's running again, you have my permission.

Martin Rupp

unread,

Nov 13, 2012, 10:18:43 AM11/13/12

to juli...@googlegroups.com

Hi Tim, hi Stefan.

I fixed the parallel version now. I had to replace the load("parJuliaInit.jl") with @everywhere load("parJuliaInit.jl"). I didn't know about the @everywhere keyword, and it's not mentioned in the docs ( http://docs.julialang.org/en/latest/manual/parallel-computing/ ) - however there is written that load behaves like @everywhere load ("To make your code available to all processors, the load function will automatically load a source file on all currently available processors:") , which is not the case. perhaps you should update the docu there.

Matthias Schabel

unread,

Nov 13, 2012, 12:45:14 PM11/13/12

to juli...@googlegroups.com

Dear Julians -

I took it on myself to translate the MATLAB expm function into Julia (attached), test performance, etc... Using Version 0.0.0+101822345.r39f0.dirty

Julia :

L=[1 2 3 4 5 6 7 8 10 12 14 16 32 64 80 96 128]
t=zeros(size(L))

for j=1:length(L)
tic()
for i=1:1000
A=rand(L[j],L[j])]
expm(A)
end
print
t[j]=toc()
end

N = 1 elapsed time: 0.009048938751220703 seconds
N = 2 elapsed time: 0.015383005142211914 seconds
N = 3 elapsed time: 0.017516136169433594 seconds
N = 4 elapsed time: 0.03593897819519043 seconds
N = 5 elapsed time: 0.027956008911132812 seconds
N = 6 elapsed time: 0.056150197982788086 seconds
N = 7 elapsed time: 0.06143689155578613 seconds
N = 8 elapsed time: 0.07320594787597656 seconds
N = 10 elapsed time: 0.1075439453125 seconds
N = 12 elapsed time: 0.16915082931518555 seconds
N = 14 elapsed time: 0.24249696731567383 seconds
N = 16 elapsed time: 0.35841989517211914 seconds
N = 32 elapsed time: 1.3420031070709229 seconds
N = 64 elapsed time: 6.925996780395508 seconds
N = 80 elapsed time: 10.390134811401367 seconds
N = 96 elapsed time: 14.973906993865967 seconds
N = 128 elapsed time: 56.07012605667114 seconds

MATLAB :

L=[1 2 3 4 5 6 7 8 10 12 14 16 32 64 80 96 128]
t=zeros(size(L));

for j=1:length(L)
tic();
for i=1:1000
A=rand(L(j));
expm(A);
end;
t(j)=toc();
disp(['N = ' num2str(L(j)) ' elapsed time: ' num2str(t(j))]);
end

N = 1 elapsed time: 0.22842
N = 2 elapsed time: 0.22801
N = 3 elapsed time: 0.23025
N = 4 elapsed time: 0.22092
N = 5 elapsed time: 0.20535
N = 6 elapsed time: 0.20805
N = 7 elapsed time: 0.2122
N = 8 elapsed time: 0.22135
N = 10 elapsed time: 0.2373
N = 12 elapsed time: 0.47991
N = 14 elapsed time: 0.26068
N = 16 elapsed time: 0.26273
N = 32 elapsed time: 0.4116
N = 64 elapsed time: 2.1163
N = 80 elapsed time: 2.9543
N = 96 elapsed time: 3.5195
N = 128 elapsed time: 5.1676

C++ mex implementation of expm :

N = 1 elapsed time: 0.015567
N = 2 elapsed time: 0.015793
N = 3 elapsed time: 0.018737
N = 4 elapsed time: 0.0259
N = 5 elapsed time: 0.027777
N = 6 elapsed time: 0.029087
N = 7 elapsed time: 0.031667
N = 8 elapsed time: 0.031717
N = 10 elapsed time: 0.037776
N = 12 elapsed time: 0.04179
N = 14 elapsed time: 0.05091
N = 16 elapsed time: 0.057417
N = 32 elapsed time: 0.20996
N = 64 elapsed time: 1.1903
N = 80 elapsed time: 1.9471
N = 96 elapsed time: 2.5625
N = 128 elapsed time: 4.4311

So Julia is comparable to the C++ mex version up to N~5, and faster than the native MATLAB implementation up to N~12, but then rapidly degrades for large N. In contrast, the native MATLAB code appears to become competitive with C++ for large N. This is what I would expect as the algorithm is going to become dominated by matrix operations (primarily matrix multiplication and \ operations in PadeApproximantOfDegree(...). Strange that the Julia implementation degrades so much...

Another issue I have is with the choice to just [] for array access instead of () : not only does this make translating from MATLAB painful and error-prone, but it loses the natural relationship between arrays and functions - I would have expected that a language like Julia would have N-dimensional arrays inherit from Function{T}(x1::Int,x2::Int,...,xN::Int) (pseudo Julia code...). This would allow easy definition and use of functions that generate array values on demand, which can be extremely useful in large simulations, etc...

Matthias

expm.jl

Stefan Karpinski

unread,

Nov 13, 2012, 1:27:29 PM11/13/12

to Julia Dev

On Tue, Nov 13, 2012 at 12:45 PM, Matthias Schabel <matt...@schabels.com> wrote:

Another issue I have is with the choice to just [] for array access instead of () : not only does this make translating from MATLAB painful and error-prone, but it loses the natural relationship between arrays and functions - I would have expected that a language like Julia would have N-dimensional arrays inherit from Function{T}(x1::Int,x2::Int,...,xN::Int) (pseudo Julia code...). This would allow easy definition and use of functions that generate array values on demand, which can be extremely useful in large simulations, etc...

Changing array indexing to use square brackets is one of Julia's crucial usability and clarity improvements, and also removes a massive implementation headache for the compiler. (These two are often aligned: compilers and programmers are both trying to read and understand the meaning of code.) Consider the following Matlab code example (credit due to Jeff, who pointed out the badness of this issue to me):

x = a(b(c(d(end))))

Now end, when used like this, means the length of the innermost array that is being indexed into. But which variable is the array here? You can't tell just by reading the code because any of these might be function calls or they might be array indexing. The compiler is in no better shape than a programmer looking at this code. In general, the meaning of this code depends on the runtime values of a, b, c and d. Therefore, the compiler has to emit the equivalent of the following pseudocode:

if isarray(d)
x = a(b(c(d(length(d))))

elseif isarray(c)
  x = a(b(c(d(length(c))))
elseif isarray(b)

  x = a(b(c(d(length(b))))
elseif isarray(a)
  x = a(b(c(d(length(a))))

else
# handle error at run time
end

Ouch. Since Julia distinguishes between function calls and indexing syntactically, there's no such confusion – for the programmer or the compiler. The end corresponds to the innermost square brackets and if there are no such square brackets, it's a compile time syntax error. As cute as it may seem to think of an array or a hash as being a function mapping indices to values, in reality the use cases and implementations of arrays and functions are so radically different that conflating the two causes endless problems for both programmers and compilers.

There's a more general principle here – one that dynamic languages have traditionally completely ignored – that the meaning of code should not depend on runtime values, but rather only on things that can be computed before the code runs. In fully static languages, this is strictly enforced by the compiler, but in dynamic language it is often not the case.

Matthias Schabel

unread,

Nov 13, 2012, 2:11:02 PM11/13/12

to juli...@googlegroups.com

> Changing array indexing to use square brackets is one of Julia's crucial usability and clarity improvements, and also removes a massive implementation headache for the compiler. (These two are often aligned: compilers and programmers are both trying to read and understand the meaning of code.) Consider the following Matlab code example (credit due to Jeff, who pointed out the badness of this issue to me):
>
> x = a(b(c(d(end))))
>
> Now end, when used like this, means the length of the innermost array that is being indexed into. But which variable is the array here? You can't tell just by reading the code because any of these might be function calls or they might be array indexing. The compiler is in no better shape than a programmer looking at this code. In general, the meaning of this code depends on the runtime values of a, b, c and d. Therefore, the compiler has to emit the equivalent of the following pseudocode:
>
> if isarray(d)
> x = a(b(c(d(length(d))))
> elseif isarray(c)
> x = a(b(c(d(length(c))))
> elseif isarray(b)
> x = a(b(c(d(length(b))))
> elseif isarray(a)
> x = a(b(c(d(length(a))))
> else
> # handle error at run time
> end
>
> Ouch. Since Julia distinguishes between function calls and indexing syntactically, there's no such confusion – for the programmer or the compiler. The end corresponds to the innermost square brackets and if there are no such square brackets, it's a compile time syntax error. As cute as it may seem to think of an array or a hash as being a function mapping indices to values, in reality the use cases and implementations of arrays and functions are so radically different that conflating the two causes endless problems for both programmers and compilers.

I'm not convinced that this example makes sense; I would associate end with d only, not percolate it up through the function call tree. I would expect the compiler to turn this into something like

operator()(a,operator()(b,operator()(c,operator()(d,end))))

Anyway, in MATLAB one gets strange results in this case :

b = @(idx)(idx)
a = 5:-1:1

b(end) -> 1
a(b(end)) -> 5
b(a(end)) -> 1

so, aside from b(end) giving a somewhat weird value of 1 (I would expect an error), this looks like it is associating with the inner call.

I would expect "end" to only be supported by types that have array-like behavior and be a compile time error otherwise, using multiple dispatch :

operator()(A::Array{T,1},idx::Index)
return A.data[idx] # [] used in C-like context for linear array indexing
end

operator()(A::Array{T,1},end::End)
return A(length(A))
end

Tangentially, in certain prominent domains (http://en.wikipedia.org/wiki/Matrix_mechanics) the matrix-function relationship is a well-accepted correspondence, not just a "cute" way of thinking about arrays...

> There's a more general principle here – one that dynamic languages have traditionally completely ignored – that the meaning of code should not depend on runtime values, but rather only on things that can be computed before the code runs. In fully static languages, this is strictly enforced by the compiler, but in dynamic language it is often not the case.

I am completely on your side - I would love to get the performance, granularity and expressiveness of C++ with the dynamism and simplicity of MATLAB, and that's what I hope Julia can become. I am neither a computer scientist nor a compiler writer, so only can comment on my experiences as a frequent user of technical computing systems and an occasional writer of libraries... Please don't take my comments as being criticisms. FWIW, I'm just trying to provide a running commentary on my experiences with and opinions on Julia as I play with it.

Best,

Matthias

Stefan Karpinski

unread,

Nov 13, 2012, 2:35:29 PM11/13/12

to Julia Dev

No offense taken and the feedback is helpful. I was only justifying the design choice. I'm sure the performance issue will spark some discussion as well. Don't really have time to get into that at the moment...

--

John Cowan

unread,

Nov 13, 2012, 2:44:55 PM11/13/12

to juli...@googlegroups.com

On Tue, Nov 13, 2012 at 1:27 PM, Stefan Karpinski <ste...@karpinski.org> wrote:

There's a more general principle here – one that dynamic languages have traditionally completely ignored – that the meaning of code should not depend on runtime values, but rather only on things that can be computed before the code runs. In fully static languages, this is strictly enforced by the compiler, but in dynamic language it is often not the case.

I think another way of expressing that is "Ad hoc polymorphism is overall a Bad Thing." In Scheme, which is in some ways a very static dynamic language, the only instance of ad hoc polymorphism is that the arithmetic procedures work on both exact and inexact (i.e. floating-point) numbers. All other procedures are either monomorphic (work on only one type) or universally polymorphic (work on any type because they don't care about type).

For example, the "vector-set!" procedure, which mutates a vector, is monomorphic on the vector itself (there is only one type of vector in Scheme), monomorphic on the index (it must be an exact integer), and universally polymorphic on the new value (it can be anything). There is no reason in principle for the use of universal rather than inclusion polymorphism, except that it happens that (R5RS and R7RS-small) Scheme has no subtypes.

Now Julia does have ad hoc polymorphism, but there seems to be a fairly strong cultural preference for using it only in parametric ways. For example, + does not mean "concatenate strings", although it would be perfectly licit to add such a method to it. Consequently, even though the *compiler* doesn't know that + is always commutative and associative (modulo floating-point hassles), the programmer does.

--
GMail doesn't have rotating .sigs, but you can see mine at http://www.ccil.org/~cowan/signatures

Jeff Bezanson

unread,

Nov 13, 2012, 2:59:26 PM11/13/12

to juli...@googlegroups.com

Thank you, we can track the performance of this and try to improve it.
I will also just add that you can add methods to [] (ref/assign) to get functionally-defined arrays.

--

Matthias Schabel

unread,

Nov 13, 2012, 3:38:48 PM11/13/12

to juli...@googlegroups.com

Thanks... Any pointers showing how to add methods to []? Also, how does one find existing methods? Unlike +, I can't just type [] or () at the command line to get a listing...

--

Tom Short

unread,

Nov 13, 2012, 3:55:38 PM11/13/12

to juli...@googlegroups.com

On Tue, Nov 13, 2012 at 3:38 PM, Matthias Schabel <matt...@schabels.com> wrote:
> Thanks... Any pointers showing how to add methods to []? Also, how does one
> find existing methods? Unlike +, I can't just type [] or () at the command
> line to get a listing...
>

Type "ref" and/or "assign" at the command line (without the quotes).

a[3] is equivalent to ref(a, 3).

Matthias Schabel

unread,

Nov 13, 2012, 4:05:00 PM11/13/12

to juli...@googlegroups.com

> Type "ref" and/or "assign" at the command line (without the quotes).
>
> a[3] is equivalent to ref(a, 3).

Thanks - do you know what the equivalents for a(x) and a.x are?

Matthias

Stefan Karpinski

unread,

Nov 13, 2012, 4:36:09 PM11/13/12

to Julia Dev

On Tue, Nov 13, 2012 at 4:05 PM, Matthias Schabel <matt...@schabels.com> wrote:

Thanks - do you know what the equivalents for a(x) and a.x are?

Those cannot be overloaded.

Matthias Schabel

unread,

Nov 13, 2012, 8:18:15 PM11/13/12

to juli...@googlegroups.com

Is there any way to achieve a C++/Java like OO syntax or all method invocations through function calls?

On Tue, Nov 13, 2012 at 4:05 PM, Matthias Schabel <matt...@schabels.com> wrote:

Thanks - do you know what the equivalents for a(x) and a.x are?

Those cannot be overloaded.

--

Jeff Bezanson

unread,

Nov 13, 2012, 9:19:11 PM11/13/12

to juli...@googlegroups.com

Method invocations are through function call syntax. But, a.f(x) is valid syntax, and calls the function stored in field f in object a. This is not a "julia method call", but is sometimes useful.

--

Viral Shah

unread,

Nov 14, 2012, 12:59:55 AM11/14/12

to juli...@googlegroups.com

Could you post this as an issue so that we do not lose it, and perhaps even put the code in a gist on github?

Doug Bates wrote expm, and I believe he is on travel this week. I'll also look into it later this week.

-viral

Viral Shah

unread,

Nov 14, 2012, 1:04:32 AM11/14/12

to juli...@googlegroups.com

Martin,

I tried the example from the manual, and it works. I do get the same error as yours otherwise. I suspect that the case of nested loads is one that does not work, and hence requires @everywhere.

This should certainly be filed as an issue. Would be great if you could do that.

-viral

Jameson Nash

unread,

Nov 14, 2012, 2:18:24 AM11/14/12

to juli...@googlegroups.com

I don't see nearly the same performance gap (see below for Julia timings, MATLAB timings were in the same ballpark). However, I'm seeing an issue that is definitely inexplicably fishy. In the libuv fork, this same code, for N=17 (128), executes in ~10 seconds, occupying 100% of one processor. Whereas on the main fork, it pegs all 4 cores of my i7, and takes 50% longer.

I tried using mkl from MATLAB2011b, however it lacked dgesdd_ and DGEMM crashed (I'm downloading MATLAB2012b now)

** Julia timings of test code: **
N=1 elapsed time: 0.01194000244140625 seconds
N=2 elapsed time: 0.01686692237854004 seconds
N=3 elapsed time: 0.03225588798522949 seconds
N=4 elapsed time: 0.03948688507080078 seconds
N=5 elapsed time: 0.05979800224304199 seconds
N=6 elapsed time: 0.042874813079833984 seconds
N=7 elapsed time: 0.07765388488769531 seconds
N=8 elapsed time: 0.07561302185058594 seconds
N=9 elapsed time: 0.10958385467529297 seconds
N=10 elapsed time: 0.14064788818359375 seconds
N=11 elapsed time: 0.14958906173706055 seconds
N=12 elapsed time: 0.32268595695495605 seconds
N=13 elapsed time: 0.6324131488800049 seconds
N=14 elapsed time: 2.3899431228637695 seconds
N=15 elapsed time: 3.87203311920166 seconds
N=16 elapsed time: 5.7606799602508545 seconds
N=17 elapsed time: 14.2980318069458 seconds

** Errors if I try using an old version of MKL **
symbol could not be found dgesdd_ (-1): dlsym(0x102a69940, dgesdd_): symbol not found
symbol could not be found dgesdd_ (-1): dlsym(0x102a69940, dgesdd_): symbol not found

MKL ERROR: Parameter 10 was incorrect on entry to DGEMM

MKL ERROR: Parameter 10 was incorrect on entry to DGEMM

MKL ERROR: Parameter 10 was incorrect on entry to DGEMM

MKL ERROR: Parameter 10 was incorrect on entry to DGEMM
Segmentation fault: 11

** Why I hate Matlab's implementation of () for array indexing **
>> [1 2 3](2)
[1 2 3](2)
|
Error: Unbalanced or unexpected parenthesis or
bracket.

> <expm.jl>--
>
>
>

Viral Shah

unread,

Nov 14, 2012, 7:00:42 AM11/14/12

to juli...@googlegroups.com

dgesdd is relatively new - I think LAPACK 3.3. Presumably, MKL has shipped with a LAPACK after 3.3 was released.

-viral

Carlo Baldassi

unread,

Nov 14, 2012, 9:04:42 AM11/14/12

to juli...@googlegroups.com

For the record, this is the performance I get on two different machines (Julia uses OpenBLAS on both, native expm julia implementation, both use linux, Matlab version is R2012a):

1) Intel i5 M430 2.27GHz, 2 cores (4 logical):

#N   Julia      Matlab
-----------------------
1    0.0024488 0.35688
2    0.199072   0.30799
3    0.020838   0.34319
4    0.042666   0.20448
5    0.027791   0.20533
6    0.057714   0.20133
7    0.0864239 0.20578
8    0.05966    0.21359
10   0.0847881 0.2338
12   0.120878   0.24585
14   0.145662   0.2598
16   0.18716    0.27623
32   0.655307   0.50556
64   2.29593    1.3291
80   3.9531     2.1365
96   6.03715    3.0895
128 11.6631    5.9836

2) AMD Opteron 6282, 2.6 GH, 8 cores (64 logical)

#N   Julia        Matlab
-------------------------
1    0.000439882 0.16423
2    0.0422549    0.14235
3    0.027849     0.14447
4    0.0373981    0.12857
5    0.064539     0.13053
6    0.0559471    0.1325
7    0.043076     0.13685
8    0.078299     0.13767
10   0.091233     0.15247
12   0.125161     0.15868
14   0.155534     0.17247
16   0.203715     0.17603
32   0.649036     0.81484
64   2.71285      2.0239
80   4.03572      2.9162
96   5.74543      4.1006
128 10.9816      7.5335

The readings are stable across different trials.

--

Viral Shah

unread,

Nov 14, 2012, 9:19:12 AM11/14/12

to juli...@googlegroups.com

Yes, I see the similar performance across julia+openblas and Matlab R2012a.

OpenBLAS is probably using multiple threads. Forcing it to use only 1 thread, I get slightly better performance. At n=128, julia takes 8.8 seconds, whereas matlab takes 5.6 seconds.

-viral

> --
>
>
>

Viral Shah

unread,

Nov 14, 2012, 9:29:43 AM11/14/12

to juli...@googlegroups.com

There should be some performance to gain from directly calling BLAS and LAPACK routines, instead of using \, / and *. The polyalgorithms in there probably end up doing unnecessary work, and some copies can also be avoided.

I am reasonably sure that should put us on par with matlab and perhaps even a bit faster.