Le mercredi 11 mai 2016 à 23:03 -0700, Anonymous a écrit :
> In response to both Kristoffer and Keno's timely responses,
>
> Originally I just did a simple @time test of the form
> Matrix .* horizontal vector
>
> and then tested the same thing with for loops, and the for loops were
> way faster (and used way less memory)
>
> However I just devectorized one of my algorithms and ran an @time
> comparison and the vectorized version was actually twice as fast as
> the devectorized version, however the vectorized version used way
> more memory. Clearly I don't really understand the specifics of what
> makes code slow, and in particular how vectorized code compares to
> devectorized code. Vectorized code does seem to use a lot more
> memory, but clearly for my algorithm it nevertheless runs faster than
> the devectorized version. Is there a reference I could look at that
> explains this to someone with a background in math but not much
> knowledge of computer architecture?
I don't know about a reference, but I suspect this is due to BLAS.
Vectorized versions of linear algebra operations like matrix
multiplication are highly optimized, and run several threads in
parallel. OTC, your devectorized code isn't carefully tuned for a
specific processor model, and uses a single CPU core (soon Julia will
support using several threads, and see [1]).
So depending on the particular operations you're running, the
vectorized form can be faster even though it allocates more memory. In
general, it will likely be faster to use BLAS for expensive operations
on large matrices. OTOH, it's better to devectorize code if you
successively perform several simple operations on an array, because
each operation currently allocates a copy of the array (this may well
change with [2]).
Regards
1:
http://julialang.org/blog/2016/03/parallelaccelerator
2:
https://github.com/JuliaLang/julia/issues/16285