On Monday, January 28, 2013 07:43:49 AM John Myles White wrote:
> This is really something that needs improvement. I entirely agree that
> vectorized code is easier to read and write.
Often, but not always. Sometimes with Matlab, which generally has poor
performance unless you vectorize, I had to go through contortions to figure out
how to vectorize my computation.
Certainly this is something that will, in the fullness of time, be addressed.
I think there are a couple of efforts underway; one that is publicly available
is Krys' "DeVec" framework:
https://github.com/kk49/julia-delayed-matrix
Obviously it would be better to have automatic elision of temporaries built
into core Julia; I don't know about other people's plans, and I'm not working
in this area myself, but possibly this is an area where joining the effort
might make it happen sooner.
Another resource, not immediately visible, is that many of the matrix routines
have a "preallocated output" form. For example, "A*B" calls A_mul_B, and
A_mul_B has both
C = A_mul_B(A, B)
and
A_mul_B(C, A, B)
forms. The latter allows you to explicitly manage the memory for the output
(i.e., avoid temporaries), which is usually the main source of difference
between vectorized and devectorized performance. A_mul_B(C, A, B) returns C,
so you can compose it with other such routines. Not as pretty as A*B, of
course, but certainly it gets you performance without having to dive down into
low-level Lapack.
--Tim