how to apply vector on diagonal of matrix ?

966 views
Skip to first unread message

paul analyst

unread,
May 20, 2014, 9:04:33 AM5/20/14
to julia...@googlegroups.com
a=rand(5)
b=zeros(5,5)
how to apply a on diagonal b ?
Paul

Oliver Lylloff

unread,
May 20, 2014, 10:17:11 AM5/20/14
to julia...@googlegroups.com
Hi Paul, 

if b is just zeros, then b = diagm(a).

Best,
Oliver

paul analyst

unread,
May 20, 2014, 10:48:30 AM5/20/14
to julia...@googlegroups.com
b- not is zeros sometimes...
c= diagm(a)
b=b+c.
Thx
Paul

Oliver Lylloff

unread,
May 20, 2014, 11:18:33 AM5/20/14
to julia...@googlegroups.com
Yes, and for larger arrays it might be beneficial to write in a loop:

a = rand(1000);
b = rand(1000,1000);
for i = 1:length(a)
    b[i,i] = b[i,i] + a[i];
end

Paul Analyst

unread,
May 20, 2014, 1:22:51 PM5/20/14
to julia...@googlegroups.com
I have arrays about 10k*10k. Is faster by loop?
Paul
W dniu 2014-05-20 17:18, Oliver Lylloff pisze:

Tim Holy

unread,
May 20, 2014, 3:28:17 PM5/20/14
to julia...@googlegroups.com
On Tuesday, May 20, 2014 07:22:51 PM Paul Analyst wrote:
> I have arrays about 10k*10k. Is faster by loop?

I think you'll find that loops are a generic answer to many of the questions
you've asked on this list. While in some languages loops are slow and you
_have_ to know about and use some high-level function to get good performance,
that doesn't apply to Julia. Indeed, many of Julia's high-level functions are
written in terms of plain, entirely non-magical loops. Loops you write
yourself will generally be just as fast (though keep the "performance tips"
section of the manual firmly in mind). Indeed, sometimes the loop is better
than the high-level function, because you can do your calculation without
creating temporaries.

Once you get used to this idea, I think you'll find it very freeing, because
you don't need to know about some clever way to leverage a high-level
function; if the calculation is very simple (as it is here), just write a 3
line loop and move on.

Best,
--Tim

gael....@gmail.com

unread,
May 20, 2014, 10:46:28 PM5/20/14
to julia...@googlegroups.com
While I cannot not agree with this ;), I'd like to state that:
1) High level functions might leverage clever algorithms faster than plain loops (best example comming to mind: dot).
2) Vectorized code is always easier to understand, write, fix and maintain because the intent is clear from the start. You equation is written just as it was on paper and not burried within nested loops among many explicit indices.
Moreover, Julia will get better at devectorizing and at avoiding temporaries.

Therefore I would recommand using explicit loops only when *proved* to provide a necessary speedup or a memory gain.

In this case diagm is working just as intended and saving 20ns on matrix construction just seem silly. I perfectly understand your point though but explicit loops have downsides too.

Profile first, optimize later... Comment now! ;)

Joey Huchette

unread,
May 20, 2014, 11:15:27 PM5/20/14
to julia...@googlegroups.com
Perhaps a setdiag!(b,a) function would be handy.

Andreas Noack Jensen

unread,
May 21, 2014, 1:33:01 AM5/21/14
to julia...@googlegroups.com
Please consider b += Diagonal(c) instead of diagm. Diagonal(c) only stores the diagonal elements but works like diagm(c) for matrix arithmetic

Vectorized code is always easier to understand

That is a strong statement. I have vectorised MATLAB code with repmat and meshgrids that is completely unreadable, but would be fairly easy to follow if written as loops. I really enjoy that I can just write a loop in Julia without slow execution.
--
Med venlig hilsen

Andreas Noack Jensen

Tim Holy

unread,
May 21, 2014, 7:21:44 AM5/21/14
to julia...@googlegroups.com
On Wednesday, May 21, 2014 07:33:01 AM Andreas Noack Jensen wrote:
>> Vectorized code is always easier to understand
>
> That is a strong statement. I have vectorised MATLAB code with repmat and
> meshgrids that is completely unreadable, but would be fairly easy to follow
> if written as loops. I really enjoy that I can just write a loop in Julia
> without slow execution.

Agreed 100%---I've been there more times than I can count. Moreover, if a user
keeps asking "is there a function in the standard library to do X?", at a
certain point it becomes very liberating to realize they don't _have_ to
master a (large) standard library in order to make progress on their work.

--Tim

Oliver Lylloff

unread,
May 21, 2014, 7:28:46 AM5/21/14
to julia...@googlegroups.com
That's a great feature Andreas!

Do you mean like this:

a = rand(5)
b = zeros(5,5)
b += Diagonal(a)

Because I get a no method +(Array{Float64,2},Diagonal{Float64}) ...

I'm on 0.2 and a fairly old commit right now, so that might be the reason.
Julia Version 0.2.0
Commit 05c6461 (2013-11-16 23:44 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin12.5.0)
  WORD_SIZE: 64
  BLAS: libgfortblas
  LAPACK: liblapack
  LIBM: libopenlibm

-- Oliver

Ivar Nesje

unread,
May 21, 2014, 7:35:42 AM5/21/14
to julia...@googlegroups.com
Yes, that only works on the 0.3-prerelease versions. Hopefully it won't be too long until a final version is released.

Ivar

Andreas Lobinger

unread,
May 21, 2014, 8:09:57 AM5/21/14
to julia...@googlegroups.com
Hello colleague,


On Wednesday, May 21, 2014 7:33:01 AM UTC+2, Andreas Noack Jensen wrote:
Please consider b += Diagonal(c) instead of diagm. Diagonal(c) only stores the diagonal elements but works like diagm(c) for matrix arithmetic

Vectorized code is always easier to understand

That is a strong statement. I have vectorised MATLAB code with repmat and meshgrids that is completely unreadable, but would be fairly easy to follow if written as loops. I really enjoy that I can just write a loop in Julia without slow execution.

If you write code in Matlab (or similar) that's using repmat (nowadays bsxfun) and meshgrids (and reshape) you are operating on wrong side of vectorising your code. Most likely (and there might be good reasons for that) your data is not in the 'right' shape, so you reorder more than you calculate. For problems like this explicit loops are often the better way (so straightforward in julia) OR you take some thinking, why the data needs to be reordered for certain operations.

The right side of vectorized code is the SIMD thinking which does operations (some times linalg) on Matrix/Vector forms, and that in the best sense leads to one-liners like

M += ((A*B) + (d*C) + (E .* M0))

which are easy to read/write/understand.

Expressions like this seems to create in julia an incredible ammount of intermediates and temporary allocations, while actually the output array (M) is already allocated and the + and * operations just needed to be executed with the 'right' indexing.

Some people tend to solve this by writing explicit loops and create half a page of code for the above one-liner.

Tobias Knopp

unread,
May 21, 2014, 8:22:19 AM5/21/14
to julia...@googlegroups.com
But is

M += ((A*B) + (d*C) + (E .* M0))

any slower in Julia than in Matlab? Avoiding the temporaries is a lot less trivial as it might look like and I am not sure that Matlab does this.

Tim Holy

unread,
May 21, 2014, 8:55:34 AM5/21/14
to julia...@googlegroups.com
I agree that vectorizing is very nice in some circumstances, and it will be
great if Julia can get to the point of not introducing so many temporaries.
(Like Tobi, I am not convinced this is an easy problem to solve.)

But there are plenty of cases where loops are clearer even when vectorization
is both possible and efficient. For example, surely you can't tell me that

L = similar(A)
m,n = size(A)
L[2:m-1,2:n-1] = A[3:m,2:n-1] + A[1:m-2,2:n-1] + A[2:m-1,3:n] +
A[2:m-1,1:n-2] - 4A[2:m-1, 2:n-1]

is easier to read and understand than

L = similar(A)
for j=2:size(A,2)-1, i=2:size(A,1)-1
L[i,j] = A[i+1,j] + A[i-1,j] + A[i,j+1] + A[i,j-1] - 4A[i,j]
end

Not only does the loops-version have fewer keystrokes, it's a heck of a lot
clearer that what you're doing is computing the 2d discrete laplacian.

--Tim

Andreas Lobinger

unread,
May 21, 2014, 9:27:36 AM5/21/14
to julia...@googlegroups.com
Hello colleague,


On Wednesday, May 21, 2014 2:55:34 PM UTC+2, Tim Holy wrote:
I agree that vectorizing is very nice in some circumstances, and it will be
great if Julia can get to the point of not introducing so many temporaries.
(Like Tobi, I am not convinced this is an easy problem to solve.)

Unlike you and Tobi, i'm absolutely sure, that this is a hard problem. But the first sentence on julialang.org is:

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

Therefore i'm expecting high-level SIMD language constructs.

But there are plenty of cases where loops are clearer even when vectorization
is both possible and efficient. For example, surely you can't tell me that

    L = similar(A)
    m,n = size(A)
    L[2:m-1,2:n-1] = A[3:m,2:n-1] + A[1:m-2,2:n-1] + A[2:m-1,3:n] +
A[2:m-1,1:n-2] - 4A[2:m-1, 2:n-1]

is easier to read and understand than

    L = similar(A)
    for j=2:size(A,2)-1, i=2:size(A,1)-1
        L[i,j] = A[i+1,j] + A[i-1,j] + A[i,j+1] + A[i,j-1] - 4A[i,j]
    end

Not only does the loops-version have fewer keystrokes, it's a heck of a lot
clearer that what you're doing is computing the 2d discrete laplacian.


1) ... you probably overlooked my sentence "you reorder more than you calculate. For problems like this explicit loops are often the better way"
2) Actually by inspection i can read in the first form: a linear combination of submatrices of A.
3) And both versions i expect to be encapsulated in a L =  disc_laplace(A)



Patrick O'Leary

unread,
May 21, 2014, 9:41:29 AM5/21/14
to julia...@googlegroups.com
On Wednesday, May 21, 2014 8:27:36 AM UTC-5, Andreas Lobinger wrote:
Therefore i'm expecting high-level SIMD language constructs.

Well, there's this...
https://github.com/JuliaLang/julia/pull/5355

...and this work in progress...

https://github.com/JuliaLang/julia/pull/6271

...and if you need tighter control, that's been discussed as well...

https://github.com/JuliaLang/julia/issues/2299

Tobias Knopp

unread,
May 21, 2014, 9:41:53 AM5/21/14
to julia...@googlegroups.com
Julia does have high level SIMD constructs.
But again, does Matlab avoid temporaries in vectorized expressions?

The fact that for loops are slower than vectorized expressions in Matlab does not mean that the vectorized code is optimal and any better than what we have in Julia.

gael....@gmail.com

unread,
May 21, 2014, 1:11:45 PM5/21/14
to julia...@googlegroups.com
Le mercredi 21 mai 2014 14:55:34 UTC+2, Tim Holy a écrit :
I agree that vectorizing is very nice in some circumstances, and it will be
great if Julia can get to the point of not introducing so many temporaries.
(Like Tobi, I am not convinced this is an easy problem to solve.)

But there are plenty of cases where loops are clearer even when vectorization
is both possible and efficient. For example, surely you can't tell me that

    L = similar(A)
    m,n = size(A)
    L[2:m-1,2:n-1] = A[3:m,2:n-1] + A[1:m-2,2:n-1] + A[2:m-1,3:n] +
A[2:m-1,1:n-2] - 4A[2:m-1, 2:n-1]

is easier to read and understand than

    L = similar(A)
    for j=2:size(A,2)-1, i=2:size(A,1)-1
        L[i,j] = A[i+1,j] + A[i-1,j] + A[i,j+1] + A[i,j-1] - 4A[i,j]
    end

I completely agree. When I was talking about vectorized code, I meant what Andreas showed.

 
Not only does the loops-version have fewer keystrokes, it's a heck of a lot
clearer that what you're doing is computing the 2d discrete laplacian.

--Tim

On Wednesday, May 21, 2014 05:09:57 AM Andreas Lobinger wrote:
> Hello colleague,
>
> On Wednesday, May 21, 2014 7:33:01 AM UTC+2, Andreas Noack Jensen wrote:
> > Please consider b += Diagonal(c) instead of diagm. Diagonal(c) only stores
> > the diagonal elements but works like diagm(c) for matrix arithmetic
> >
> > Vectorized code is always easier to understand
> >
> >
> > That is a strong statement. I have vectorised MATLAB code with repmat and
> > meshgrids that is completely unreadable, but would be fairly easy to
> > follow
> > if written as loops. I really enjoy that I can just write a loop in Julia
> > without slow execution.
Me too :)  I agree about my statement: I completely overlooked complicated slicing and stuff. But I also said that a part of what made explicit loops less easier to understand are explicit indices. Same thing goes for "vectorized" code.

My whole point is to say that, in general, one should divide the problem into understandable pieces to write clear code. Vectorized expression are a way to get that. For instance,

A -= Diagonal(B)

is very easy to understand, if not obvious. You almost wrote in plain english there, and that's a strength: no need to actually understand the code to grasp what it does.

for i=1:length(B)
    A
[i, i] -= B[i]
end
is not that hard to understand because it's really short, but there, the only way you can get the intent of the programmer is by browsing the whole code and understanding the whole thing.

To get back to Tim example.
On could think of creating a DiscreteLaplacian2D function taking a matrix A as an input. This is actually what I would do to keep the complexity of any given piece of code very low. Let's solve the heat equation iteratively, for that we need to calculate:

DiscreteDerivative(A, 1) - alpha.*DiscreteLaplacian2D(A, [2,3])

or even better, knowing the shape of A:

TimeDerivative(A) - alpha.*SpaceLaplacian2D(A)

are very easy to grasp and understand. I've not directly used loops : this is plain vectorized code. It's obvious and easy to maintain.

I'm sure you will agree. But what's the problem then? This is not really what you meant about vectorized code.

Well, let's handle the same problem with the current trend : this is not as fast as Julia could, this might be fast enough but definitely not the fastest. It's also using more RAM that it should, so why not rewriting it into loops? Yes, of course, why not! The only problem is that the only way to achieve something usefull would be to solve the whole equation in one fat bunch of loops. Which means:
  • Duplicated code all over the place. You'll have to write/(copy/paste) your devectorized laplacian for each single equation that needs it.
  • Unreadable code.
  • Longer code.
  • More places to make mistakes.
  • More places to search for to correct copy/paste mistakes.
  • Bigger mental map for you project.
  • Fewer external contributors.
  • more "end"s ;)

gael....@gmail.com

unread,
May 21, 2014, 1:28:43 PM5/21/14
to julia...@googlegroups.com
Just to add one thing. Julia should definitely not use "fast for loops" as a selling point. If one has access to unboxed values, it's done. No hard work. This is a selling point in comparison to matlab, python and r, but not in general.

A very smart automatic devectorizer is much much much harder to get (and define, smart is magic), and I would actually love to see exactly that as the main Julia selling point within a couple of years.
This is not something anyone using LLVM can provide within a few weeks for a new language.

The perspectives would be tremendous! For code readability but also for automated parallelism.

Andreas Lobinger

unread,
May 21, 2014, 2:36:07 PM5/21/14
to julia...@googlegroups.com
Hello colleague,


On Wednesday, May 21, 2014 3:41:53 PM UTC+2, Tobias Knopp wrote:
Julia does have high level SIMD constructs.
But again, does Matlab avoid temporaries in vectorized expressions?

I dont't know, but my matlab experience (and i belong to the group of profile users and my unix insight doesn't stop at ps) shows me, that they somehow have a smart way of allocating temporaries. I was under the impression, that the point of using a high-level language is that i do not need to care what internally the processing is, i just write code that expresses what i want to calculate (i know, this is a idealized environment, but still...). matlab is now for a few years a JIT compiler and i assume that below a certain level of abstraction we would see technology like in julia. One thing i'd bet a premium bottle of tomato ketchup on, is that they have dedicated logic for array operators and an optimizer for that. 

In julia (and this i'm writing only as observation on the discussions here, my julia profiler and LLVM code skill are still evolving) something like A = A+B with A and B arrays seemed to be handled as

allocate temp as size(A)
temp = +(A,B)
assign A to temp

while something like add!(A,B) would be the solution without additional allocation.

There might be a reason in the julia architecture why it's like this and i'm looking forward to see something like an expression level optimizer, rather than depending on LLVM internal optimizers.

LLVM provides infrastructure for array datatypes but i'm missing information, how this is already used in julia.
 
The fact that for loops are slower than vectorized expressions in Matlab does not mean that the vectorized code is optimal and any better than what we have in Julia.

There is a reason why there is a MAT in the name. Some time ago i was busy in finding out why we hit a hard boundary optimizing some runtime in a simulation and (of course i could missinterpret something) align with the FPU performance of the system. In a lot of places matlab was slow and in some place it still is. For matrix and vector, basic or linear algebra matlab is reasonably fast.
And exactly that situation leads to the effect that people try to vectorize things that shouldn't be vectorized.




John Myles White

unread,
May 21, 2014, 2:44:37 PM5/21/14
to julia...@googlegroups.com
I would be really sad if Julia were to become a language in which " i just write code that expresses what i want to calculate". Declarative languages can be really great (I'm a huge fan of domain-specific systems like JAGS), but they're a real pain to work with if you want to build something outside the scope of what the designers had in mind.

The single thing I most like about Julia is that Julia's performance model is so transparent and simple. The approach you're describing would make Julia much less useful to me.

-- John

Tobias Knopp

unread,
May 21, 2014, 5:44:50 PM5/21/14
to julia...@googlegroups.com
Reading your posts I get the impression that Matlab outperforms Julia in vectorized code. I think it would be really great if you could give some numbers to let us know how far Julia lags behind. It seems they have an outstanding JIT if it can transform vectorized expressions into optimum code without temporaries.

Tim Holy

unread,
May 21, 2014, 6:20:21 PM5/21/14
to julia...@googlegroups.com
I suspect they still generate temporaries, but probably have a better garbage
collector and can re-use the same temporary across iterations. #5227 should
take care of the first, and Jeff has posted his hopes that the second will also
be implemented (someday). I suspect both of those would be easier than
transforming vectorized expressions into optimized code without temporaries,
and would achieve many of the same benefits.

--Tim

Tobias Knopp

unread,
May 22, 2014, 3:51:44 AM5/22/14
to julia...@googlegroups.com
To give this discussions some facts I have done some benchmarking on my own

Matlab R2013a:

function [ y ] = perf( )
  N
= 10000000;
  x
= rand(N,1);
  y
= x + x .* x + x .* x;
end

>> tic;y=perf();toc;
Elapsed time is 0.177664 seconds.

Julia 0.3 prerelease

 function perf()
   N
= 10000000
   x
= rand(N)
   y
= x + x .* x + x .* x
 
end

julia
> @time perf()
elapsed time
: 0.232852894 seconds (400002808 bytes allocated)

using Devectorize.jl

 function perf_devec()
   N
= 10000000
   x
= rand(N)
   
@devec y = x + x .* x + x .* x
 
end

julia
> @time perf_devec()
elapsed time
: 0.084605794 seconds (160000664 bytes allocated)

So seems all pretty consistent to me. Matlab is a little better in vectorized code as they presumely have a better memory caching. But still explicit devectorization using the @devec macro performs best. So using vectorized code in Julia is fine and "reasonable fast". If someone wants to do performance tweaking I don't see the issue telling him about devectorization.

gael....@gmail.com

unread,
May 22, 2014, 9:38:15 AM5/22/14
to julia...@googlegroups.com
Ahah !!! I was sure of it: we don't talk about the same thing. To me,
@devec y = x + x .* x + x .* x
is actually vectorized code :). When I'm talking about devectorizing code, I'm only talking about explicit loops. It's a shame that I only paid attention to Devectorize.jl yesterday night. This thing is awesome and it should be a great place to contribute to.

This should be the very first answer to "this part of my code is too slow".


Regarding the benchmarks you've done, thanks. Without evidence, no discussion. I agree.

But there are two problems with your benchmarks. Firstly, you've not repeated them and therefore can't associate an uncertainty with them. Maybe matlab code is not actually faster. Secondly, what if matlab or julia actually spend most of its time getting the random vector?

I'd recommend  you to repeat your result and compare directly the estimated distribution function. I've done just that for your simple code. I just put N = rand(...) outside of the function each time. I also created devectorized versions of your code (I mean, with explicit loops written by myself). Once with ".*" as a multiplier and once with "*". The resulting kernel densities can be found attached.

As you can see, the resulting functions are not even close from a Gaussian. Normality tests failed for each of those distributions. How to explain that? Easy: a typical desktop computer does nothing most of the time. Once you launch something, it spends it's time on this but from time to time very rarely, it needs to spend some CPU time on something else. Therefore, there is the mode : the most probable execution time and a tail that is bigger on the side of increasing times.

I just go the time using tic(), toc(), so for each of the thousand repetitions of the exact same calculation, I could follow the execution time in real time. The fact that "Explicit loop (*)" has a bigger and stranger tail is directly related to the fact that I used my mouse quite intensively during that run. What does that mean? 

1) One must repeat calculations for benchmarking.
2) Calculating the mean of the repetitions is useless because it is not a good estimator of the mode of the distribution.

The point 1 is obvious, but not point 2 : please consider the second plot attached which is a zoomed in part of the first one. As you can see, the mode of the "Explicit loop (*)" curve is positioned slightly on the left compared to the "Explicit loop (.*)" curve. This certainly means that for scalars, "*" has less overhead than ".*" (maybe just a couple of extra "if"s or an extra function call). However, because I shook my mouse (on purpose) during that run (it was so funny to see the execution time bump on the terminal), it has a bigger asymmetric tail. 

The result? The ".*" is on "average" 3% faster than the "*" version.

The problem? This is not a function benchmark, this is a system load benchmark.

Oh my, I've done it again, sorry for the long post.
figure_1.svg
figure_2.svg

Tobias Knopp

unread,
May 22, 2014, 10:12:12 AM5/22/14
to julia...@googlegroups.com
Sure I just made a very quick benchmark and it should not be taken too seriously. I just thought we should not speculate too much on what Matlab does but better measure it.

gael....@gmail.com

unread,
May 22, 2014, 10:39:56 AM5/22/14
to julia...@googlegroups.com


Le jeudi 22 mai 2014 16:12:12 UTC+2, Tobias Knopp a écrit :
Sure I just made a very quick benchmark and it should not be taken too seriously. I just thought we should not speculate too much on what Matlab does but better measure it.

That's how I understood it. (But I can't deter myself from shamelessly writing posts too long to read. So I put the emphasis there. :)

Dahua Lin

unread,
May 22, 2014, 12:19:05 PM5/22/14
to julia...@googlegroups.com
As a side note, I just cleaned up the Devectorize.jl package (tagged vs 0.4). Now it works well under Julia v0.3.

I am now working on a major upgrade to this package. This may lead to a more transparent & extensible code generator and the support of arrays of arbitrary dimensions (with the help of the ArrayViews package). 

Dahua

Tomas Lycken

unread,
May 23, 2014, 5:03:45 AM5/23/14
to julia...@googlegroups.com
I love reading this list. Aside from over and over again finding new ways in which Julia is awesome, I learn all kinds of stuff from all the side tracks you guys get into. If I could buy a beer everyone who's taught me something by expanding "a little too much" on something vaguely related to what we're actually talking about, I'd be a really poor guy afterwards.

Keep it up, I don't want to stop learning! =)

// T
Reply all
Reply to author
Forward
0 new messages