Order of multiple for-loops

770 views
Skip to first unread message

Tk

unread,
Oct 29, 2015, 10:57:23 AM10/29/15
to julia-users
Hello,

In the following explicit for-loops,

for i = 1:2,
     j = 1:3,
     k = 1:4
    @show i,j,k
end

the index "k" runs fastest, as if this is written as three for(...) loops in C etc.
On the other hand, in the following array comprehension,

a = [ (@show i,j,k; i+j+k) for i=1:2, j=1:3, k=1:4 ]

the index "i" runs fastest. To understand this behavior, is it reasonable to think that
"an index closer to the expression to be evaluated runs faster"? Or is there any different
reason that naturally determines this behavior (in the latter example)?

# If we flatten the for-loops as "for i=1:2, j=1:3, k=1:4", both cases look the same.
So initially I got a bit confused (though I now got used to it).

# I tried to remember this order by analogy to the implied Do-loop syntax in Fortran, such as

a = [((( f(i,j,k), i=1,2 ), j=1,3 ), k=1,4 )))]

shere "i" runs fastest (obviously).

Tk

unread,
Oct 29, 2015, 10:59:54 AM10/29/15
to julia-users
PS. If this is concerned, I know that arrays in Julia are column-major order.

Stefan Karpinski

unread,
Oct 29, 2015, 1:24:23 PM10/29/15
to Julia Users
Yes, this is an unfortunate consequence of mathematics being column-major – oh how I wish it weren't so. The storage order is actually largely irrelevant, the whole issue stems from the fact that the element in the ith row and the jth column of a matrix is indexes as A[i,j]. If it were A[j,i] then these would agree (and many things would be simpler). I like your explanation of "an index closer to the expression to be evaluated runs faster" – that's a really good way to remember/explain it.

Greg Plowman

unread,
Oct 29, 2015, 5:00:04 PM10/29/15
to julia-users
I wouldn't say that storage order was irrelevant, but rather that mathematics order is different to Julia's storage order.
If Julia had row-major storage, I suspect order of comprehension loops would/should be the same as normal for-loops.
(Presumably order of comprehension loops now are so that they are constructed column-major order)

In fact it took me a while to write efficient loops for Julia's column-major ordering:

for j = 1:3, i = 1:2
  stuff
with A[i,j]
end

At first, this seemed "awkward" but I'm used to it now.

Conversely, comprehension loops are written how I "think", but order is reversed for column-major efficiency.

Steven G. Johnson

unread,
Oct 29, 2015, 11:11:27 PM10/29/15
to julia-users


On Thursday, October 29, 2015 at 1:24:23 PM UTC-4, Stefan Karpinski wrote:
Yes, this is an unfortunate consequence of mathematics being column-major – oh how I wish it weren't so.

I'm not sure what you mean by "mathematics being column major".  Fortran is column-major, and a lot of the linear-algebra libraries (LAPACK etc.) are column-major in consequence, and Matlab is column-major.

However, if Julia had chosen row-major, it would have been possible with a little care to call LAPACK and other column-major libraries without making a copy of any data, basically because for every linear-algebra operation there is an equivalent operation on the transpose of the matrix.

Probably too late to revisit this decision now, in any case.   It wouldn't matter much except that people have a natural tendency to write loops in row-major order, and Fortran compilers have had to fight against this for years.  (Sometimes they can detect and rearrange the loop order.)

Brendan Tracey

unread,
Oct 30, 2015, 9:52:07 AM10/30/15
to julia-users


On Thursday, October 29, 2015 at 9:11:27 PM UTC-6, Steven G. Johnson wrote:


On Thursday, October 29, 2015 at 1:24:23 PM UTC-4, Stefan Karpinski wrote:
Yes, this is an unfortunate consequence of mathematics being column-major – oh how I wish it weren't so.

I'm not sure what you mean by "mathematics being column major".  Fortran is column-major, and a lot of the linear-algebra libraries (LAPACK etc.) are column-major in consequence, and Matlab is column-major.

However, if Julia had chosen row-major, it would have been possible with a little care to call LAPACK and other column-major libraries without making a copy of any data, basically because for every linear-algebra operation there is an equivalent operation on the transpose of the matrix.

It's not that simple. QR decomposition (dqeqrf) does not allow you to specify the transpose of the matrix directly.  Sure, you can compute A^T instead, but then it's trickier to use the result to do the follow-on operations like linear solving and computing the condition number.

Tamas Papp

unread,
Oct 30, 2015, 10:21:40 AM10/30/15
to julia...@googlegroups.com
On Fri, Oct 30 2015, Steven G. Johnson <steve...@gmail.com> wrote:

> However, if Julia had chosen row-major, it would have been possible with a
> little care to call LAPACK and other column-major libraries without making
> a copy of any data, basically because for every linear-algebra operation
> there is an equivalent operation on the transpose of the matrix.

Not in every case. I implemented a LAPACK bindings in Common Lisp [1]
which is row-major, and had to jump through a lot of hoops, occasionally
resorting to a transpose.

> Probably too late to revisit this decision now, in any case. It wouldn't
> matter much except that people have a natural tendency to write loops in
> row-major order, and Fortran compilers have had to fight against this for
> years. (Sometimes they can detect and rearrange the loop order.)

For better or worse, the majority scientific code is column major. I am
glad that Julia made this choice, a lot fewer headaches than
row-major. There is nothing intrinsically better about column-major, and
probably in an alternate universe everyone is using it, and they use a
different convention for electric charge [2], etc, but in this one it is
better to follow predominant conventions.

Best,

Tamas

[1] https://github.com/tpapp/lla
[2] https://xkcd.com/567/

Glen H

unread,
Oct 30, 2015, 10:46:36 AM10/30/15
to julia-users

On Thursday, October 29, 2015 at 1:24:23 PM UTC-4, Stefan Karpinski wrote:
Yes, this is an unfortunate consequence of mathematics being column-major – oh how I wish it weren't so. The storage order is actually largely irrelevant, the whole issue stems from the fact that the element in the ith row and the jth column of a matrix is indexes as A[i,j]. If it were A[j,i] then these would agree (and many things would be simpler). I like your explanation of "an index closer to the expression to be evaluated runs faster" – that's a really good way to remember/explain it.


To help understand, is "math being column major" referring to matrix operations in math textbooks are done by columns?  For example:


While the order is by convention (eg not that is has to be that way), this is how people are taught.

Glen

John Gibson

unread,
Oct 30, 2015, 11:44:38 AM10/30/15
to julia-users
Agreed w Glenn H here. "math being column major" is because the range of a matrix being the span of its columns, and consequently most linear algebra algorithms are naturally expressed as operations on the columns of the matrix, for example QR decomp via Gramm-Schmidt or Householder, LU without pivoting, all Krylov subspace methods. An exception would be LU with full or partial row pivoting. 

I think preference for row-ordering comes from the fact that the textbook presentation of matrix-vector multiply is given as computing y(i)= sum_j A(i,j) x(j) for each value of i. Ordering the operations that way would make row-ordering of A cache-friendly. But if instead you understand mat-vec mult as forming a linear combination of the columns of A, and you do the computation via y = sum_j A(:,j) x(j), column-ordering is cache-friendly. And it's the latter version that generalizes into all the important linear algebra algorithms.

John

Tom Breloff

unread,
Oct 30, 2015, 12:25:36 PM10/30/15
to julia-users
Preference for row ordering is more related to the way people view tabular data, I think.  For many applications/data, there are many more rows than columns (think of database tables or csv files), and it's slightly unnatural to read those into a transposed data structure for analysis.  Putting a time series into a TxN matrix is natural (that's likely how the data is stored), but inefficient if data access is sequentially through time.  Without a "TransposeView" or similar, we're forced to make a choice between poor performance or an unintuitive representation of the data.

I can appreciate that matrix operations could be more natural with column ordering, but in my experience practical applications favor row ordering.

Stefan Karpinski

unread,
Oct 30, 2015, 1:50:45 PM10/30/15
to Julia Users
Yes, this is what I meant by "mathematics is column major" – the fact that a vector is conventionally identified with the columns of matrices and that you multiply vectors by matrices as columns from the right instead of as rows from the left. This is at odds with the fact that in English we read left-to-right then top-to-bottom. So I suppose that if we read top-to-bottom first, we would also avoid this issue, but that seems like an even bigger change.

On Fri, Oct 30, 2015 at 11:44 AM, John Gibson <johnf...@gmail.com> wrote:

Tomas Lycken

unread,
Oct 30, 2015, 1:54:10 PM10/30/15
to julia-users

in my experience practical applications favor row ordering

The keyword there is that it was your experience. The thing about something as central to number crunching as organizing memory layouts in row- or column major orderings, is that there are more practical applications than any of us can imagine - in fact, most likely we couldn’t come up with all possible cases where one is significantly better than the other even if we stopped discussing Julia and just discussed that for a year.

For some applications column major is better, but for others row major rocks. We have to choose, and even if the choice was once quite arbitrary, the gain from switching is so extremely low it’s ridiculous to even consider it. Thus, I don’t think it’s fruitful to discuss here.

However, Julia is an incredibly expressive language, and you could quite easily write a macro that re-orders the indices for you:

julia> macro rowmajor(expr)
       @assert expr.head == :ref

       Expr(:ref, expr.args[1], expr.args[end:-1:2]...)
       end

julia> macroexpand(:(@rowmajor A[i,j]))
:(A[j,i])

Of course, this only works on one indexing expression at a time, but it can be extended to work recursively through an expression tree and rewrite all array indexing expressions it finds, allowing usage as

@rowmajor begin
    # formulate row-major algorithm here
end

No need to change the language - just bend it to do what you need :)

// T

Tom Breloff

unread,
Oct 30, 2015, 3:13:49 PM10/30/15
to julia-users
 We have to choose

Tomas: I agree with most of what you said, except for this part.  I do think that Julia is expressive and flexible enough that we shouldn't have to choose one way for everyone.  I made this suggestion a little while ago and I still think it's a reasonable goal for the language.  (and I still find that picture hilarious)  It's Julia... we can have our cake and eat it too... right?

To be clear... I do not want to switch everything to be row-based, but I do like the idea of abstracting the ordering, so that either paradigm can be used.

Tk

unread,
Oct 30, 2015, 5:32:36 PM10/30/15
to julia-users
Thanks very much for various information. And I am sorry if my question might be misleading or indicate
confusion or complaints about memory order (indeed, column-major is perfectly fine for me).
So let me explain a bit more about my question...

Given that Julia's arrays are column-major order, the following loop is most efficient for m x n array A:

     for i2=1:n,
           i1=1:m
           A[ i1, i2 ] = 10 * i1 + i2
    end

or equivalently,

     for i2=1:n,  i1=1:m                  # (1)
           A[ i1, i2 ] = 10 * i1 + i2
    end

To create the same array using comprehension, I once thought that the expression might be
something like this

     A = [ 10 * i1 + i2  for  i2=1:n, i1=1:m ]         # (2)

where the index most far from "for" runs faster. Then for-loops in (1) and (2) would take the same form.
In fact, Julia employs a different definition

    A = [ 10 * i1 + i2  for i1=1:m, i2=1:n ]         # (3)

where the index closer to "for" runs faster.
So part of my question was why the definition (2) was not employed (within the column-major order)
but (3) was employed.

But come to think of it..., the definition (2) is clearly not nice-looking (because the indices are swapped)
while Julia's definition in (3) is more intuitive (although the order in (1) and (3) is different).
So I guess this would be the reason for this choice (sorry if this is a silly question...).
And I now understand Stephan's comment about row-major order (thanks :)

RE how to memorize the definition (3), the following form is close to (3):

    B = [ [ 10 * i1 + i2 for i1=1:m ] for i2=1:n ]

although this gives an array of arrays. So, the definition (3) may be viewed as a "rectangularized" or serialized
version of B. This is also easy for me to memorize it.... (Btw, the above expression is similar to implied do-loops
in Fortran, but in fact the content is very different!)

Anyway, thanks very much :)

Tk

unread,
Oct 30, 2015, 5:49:03 PM10/30/15
to julia-users
As for "column-majorness" of matrix algebra, it seems to be natural (for me) because it is common to write a vector in column form. But of course, one could start with a row vector as the basic building block, so it looks like a matter of convention...

The reason for this choice might ultimately be traced back to the greater ratio of right-handed people (just my guess!).
As English is written from left to right so as to be convenient for right-handed people with pen (again a guess!!),
old mathematician in German, France, and US (where people often write from left to right) might have preferred to
use right eigenvectors A x = lambda x rather than left eigenvectors x A = lambda x, in such a way that sentences begin with
capital letters (Hello rather than helloW). Also, people may prefer reading tabular data from left to right because it is
close to reading a sentence in a book. So the true question is why there are more right-handed people :)

And because there are countries where people write from top to bottom or right to left, it would have been interesting
if matrix algebra originated from such countries...

Glen O

unread,
Oct 31, 2015, 2:42:45 AM10/31/15
to julia-users
That's the long, complicated way to do it, in my opinion.

An easier approach in the long run would be to simply define a new Array type (RMArray (for Row Major Array), perhaps), and overload the getindex and setindex functions for them, etc.

Indeed, perhaps someone could make a package that does this? I'm kind of surprised there isn't already such a package.

DNF

unread,
Oct 31, 2015, 11:44:42 AM10/31/15
to julia-users
To me it makes sense to think of the loops as 'inner' and 'outer'. So here:
for i2=1:n,
       i1=1:m
       A[ i1, i2 ] = 10 * i1 + i2
 end

i1 is the 'inner' loop variable. And here:
A = [ 10 * i1 + i2  for  i2=1:n, i1=1:m ]

i1 is the 'outer' loop variable. Makes things quite easy to think about. The inner loop is always fastest.

As for column vs row majority, I like that the fastest dimension is either the first or the last dimension, and then the first is surely easiest. Having the *second* dimension/index being fastest, just seems a bit arbitrary. The first index is also the 'inner' index in my mind.
Reply all
Reply to author
Forward
0 new messages