tic; X = rand(7000); toc;
Elapsed time is 0.579093 seconds.
tic; XX = X.^2; toc;
Elapsed time is 0.114737 seconds.
@time X = rand(7000,7000);
elapsed time: 0.114418731 seconds (392000128 bytes allocated)
@time XX = X.^2;
elapsed time: 0.369641268 seconds (392000224 bytes allocated)
@time XX = X.*X;
elapsed time: 0.146059577 seconds (392000968 bytes allocated)
function doCalc(x::Array{Float64,2})
            XX=Array(Float64, 7000,7000)
            for j=1:7000,i=1:7000
                @inbounds XX[i,j]=x[i,j]*x[i,j]
            end
            XX
end
@time XX=doCalc(X);
Octave seems to be doing a pretty good job for this type of calculation. If you want to squeeze a bit more performance out of Julia, you can try to use explicit loops (devectorize as in http://julialang.org/blog/2013/09/fast-numeric/). Then you might also remove bounds checking in a loop for faster performance.
Try this function:
XX=Array(Float64, 7000,7000)
for j=1:7000,i=1:7000
@inbounds XX[i,j]=x[i,j]*x[i,j]
end
XX
end
Followed by
| .^(x::StridedArray, y::Number) = reshape([ x[i] ^ y for i=1:length(x) ], size(x)) | 
julia> x = rand(7000,7000);
julia> @time x.^2;
elapsed time: 1.340131226 seconds (392000256 bytes allocated)
And in Octave....
>> x=rand(7000);
>> tic;y=x.^2;toc
Elapsed time is 0.201613 seconds.
Comprehensions might be the culprit, or simply use of the generic x^y here's an alternate implementation I threw together
julia> function .^{T}(x::Array{T},y::Number)
z = *(size(x)...)
new_x = Array(T,z);
tmp_x = reshape(x,z);
for i in 1:z
@inbounds if y == 2
new_x[i] = tmp_x[i]*tmp_x[i]
else
new_x[i] = tmp_x[i]^y
end
end
reshape(new_x,size(x))
end
.^ (generic function with 24 methods)
julia> @time x.^2;
elapsed time: 0.248816016 seconds (392000360 bytes allocated, 4.73% gc time)
Close....
On Monday, September 8, 2014 7:37:52 AM UTC-4, Mohammed El-Beltagy wrote:Octave seems to be doing a pretty good job for this type of calculation. If you want to squeeze a bit more performance out of Julia, you can try to use explicit loops (devectorize as in http://julialang.org/blog/2013/09/fast-numeric/). Then you might also remove bounds checking in a loop for faster performance.
Try this function:
XX=Array(Float64, 7000,7000)
for j=1:7000,i=1:7000
@inbounds XX[i,j]=x[i,j]*x[i,j]
end
XX
end
Followed by
.^(x::StridedArray, y::Number) = reshape([ x[i] ^ y for i=1:length(x) ], size(x)) how is that made less vectorized?
sumabs2(X,1)Calm down. It's very easy to match Octave here, by writing the standard library in C. Julia chose not to go that route. Of course everyone would like the concise vectorized syntax to be as fast as the for-loop syntax. If it were easy to do that *without* writing the standard library in C, it would've been done already. In the meantime it isn't easy and hasn't been done yet, so when someone asks how to write code that's faster than Octave, this is the answer.
There have been various LLVM bugs regarding automatic optimizations of converting small integer powers to multiplications. "if y == 2" code really shouldn't be necessary here, LLVM is smarter than that but sometimes fickle and buggy under the demands Julia puts on it.
And in this case the answer is clearly a reduction for which Julia has a very good set of tools to express efficiently. If your output is much less data than your input, who cares how fast you can calculate temporaries that you don't need to save?
Expectations based on the way Octave/Matlab work are really not applicable to a system that works completely differently.
X = rand(7000,7000);
@time d = sum(X.^2, 1);
elapsed time: 0.573125833 seconds (392056672 bytes allocated, 2.25% gc time)
@time d = sum(X.*X, 1);
elapsed time: 0.178715901 seconds (392057080 bytes allocated, 14.06% gc time)
@time d = sumabs2(X, 1);
elapsed time: 0.067431808 seconds (56496 bytes allocated)
X = rand(7000);
tic; d = sum(X.^2); toc;
Elapsed time is 0.167578 seconds.
@time d = sumabs2(X[:,1001:end], 1);
elapsed time: 0.175333366 seconds (336048688 bytes allocated, 7.01% gc time)
In Matlab, until quite recently X.^2 was slower than X.*X. It was that way for
something like 25 years before they fixed it.
        d
          = sumabs2(X[:,1001:end], 1);
          
      
        @time
            X.^2;
            elapsed time: 0.511988142 seconds (392000256 bytes
            allocated, 2.52% gc time)
            @time X.^2.0;
            elapsed time: 0.411791612 seconds (392000256 bytes
            allocated, 3.12% gc time)
          
      X = rand(7000);
tic; sumsq(X); toc;
Elapsed time is 0.0616651 seconds.
@time X = rand(7000,7000);
elapsed time: 0.285218597 seconds (392000160 bytes allocated)
@time sumabs2(X, 1);
elapsed time: 0.05705666 seconds (56496 bytes allocated)
@time X = rand(7000,7000);
is about 2.5 times slower in Julia 0.3 than it was in Julia 0.2 ...
                                        @time X = rand(7000,7000);
                                            elapsed time: 0.114418731 seconds
                                          (392000128 bytes
                                            allocated)
                                          
                                      Hello Andreas,
Thanks for the tip. I'll check it out. Thumbs up for the 0.4!
Jan
On 09.09.2014 17:04, Andreas Noack wrote:
If you need the speed now you can try one of the package ArrayViews or ArrayViewsAPL. It is something similar to the functionality in these packages that we are trying to include in base.
Med venlig hilsen
Andreas Noack
I think the reason for the slow down in rand since 2.1 is thisRight now we are filling the array one by one which is not efficient, but unfortunately it is our best option right now. In applications where you draw one random variate at a time there shouldn't be difference.
Med venlig hilsen
Andreas Noack
I wonder if we should provide access to DSFMT's random array generation, so that one can use an array generator. The requirements are that one has to generate at least 384 random numbers at a time or more, and the size of the array must necessarily be even.
We should not allow this with the global seed, and it can be through a randarray!() function. We can even avoid exporting this function by default, since there are lots of conditions it needs, but it gives really high performance.
-viral
Yes it is going to not be exported of we do it.
-viral