Identity Matrix

dance...@earthlink.net

unread,

Nov 14, 2008, 1:16:56 PM11/14/08

to

Anybody have a slicker way of creating an identity matrix NxN?

I am currently using:

A(1:N,1:N) = 0
forall (I = 1:N) A(I,I) = 1

Thanks.

Chris

jwm

unread,

Nov 14, 2008, 2:37:13 PM11/14/08

to

Well, maybe this one's a little better:

A = RESHAPE([((MERGE(1, 0, I == J), I = 1, N), J = 1, N)], [N, N])

although there might be issues with the assignment ---e.g., memory
usage; hopefully, others will explain it better.

Gordon Sande

unread,

Nov 14, 2008, 2:47:55 PM11/14/08

to

forall ( i = 1:n, j = 1:n ) a(i,j) = (i/j) * (j/i)

Classic example of poor style from "Elements of Programming Style"
updated to use forall.

Not recommended but it sure is slick! Or is that sick?

Richard Maine

unread,

Nov 14, 2008, 3:04:16 PM11/14/08

to

<dance...@earthlink.net> wrote:

> Anybody have a slicker way of creating an identity matrix NxN?
>
> I am currently using:
>
> A(1:N,1:N) = 0
> forall (I = 1:N) A(I,I) = 1

Maybe it is implied in the 2 other replies (so far), one of which
specifically mentioned that it was a published example of poor
programming style, and the other of which mentions (accurately) that it
needs explanation and might have poor performance....

But nobody quite explicitly said what I would, namely that there is
nothing horribly wrong with your original. It is short, simple, clear,
efficient, and other good things. Exactly what is it that you are
looking to improve?

The ony things I'd do differently, both of which are mostly personal
style choices rather than objectively better are

1. I'd just write the first line as

A = 0

2. I almost never use forall. I'd use a DO loop instead.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

dance...@earthlink.net

unread,

Nov 14, 2008, 3:40:48 PM11/14/08

to

On Nov 14, 12:04 pm, nos...@see.signature (Richard Maine) wrote:

Thanks for your comments.

Yes, I noticed that both suggestions contain more operations. I am
trying to minimize the number of operations. When using my method the
number of operations > NxN + N, i.e. first you set all of the array
elements to 0 then you set the diagonal to 1. I thought perhaps there
was a slick method using implied do loops (or other constructs with
forall, where, or fancy triplet descriptors) that would simply do an
NxN set of operations.

BTW, in answer as to why I specified A(1:N,1:N) rather than A is that
I am only using a portion of a larger matrix.

best regards,

Chris

dance...@earthlink.net

unread,

Nov 14, 2008, 3:56:35 PM11/14/08

to

> Chris- Hide quoted text -
>
> - Show quoted text -

P.s.

I used a forall because I was under the impression that both forall
and where allow more freedom in compiler optimization, particularly
with multi-processor hardware. (plus I like how it reads compared to
a do loop).

Perhaps I am mistaken in this.?

c

Glen Herrmannsfeldt

unread,

Nov 14, 2008, 4:00:36 PM11/14/08

to

dance...@earthlink.net wrote:
(snip)

>>> A(1:N,1:N) = 0
>>> forall (I = 1:N) A(I,I) = 1

(snip)

> Yes, I noticed that both suggestions contain more operations. I am
> trying to minimize the number of operations. When using my method the
> number of operations > NxN + N, i.e. first you set all of the array
> elements to 0 then you set the diagonal to 1. I thought perhaps there
> was a slick method using implied do loops (or other constructs with
> forall, where, or fancy triplet descriptors) that would simply do an
> NxN set of operations.

do i=1,N
do j=1,N
a(j,i)=0
enddo
a(i,i)=1
enddo

has a more locality of reference than separate loops,
which might be important in a large array.

> BTW, in answer as to why I specified A(1:N,1:N) rather than A is that
> I am only using a portion of a larger matrix.

-- glen

Richard Maine

unread,

Nov 14, 2008, 4:06:01 PM11/14/08

to

<dance...@earthlink.net> wrote:

> Yes, I noticed that both suggestions contain more operations. I am
> trying to minimize the number of operations. When using my method the
> number of operations > NxN + N, i.e. first you set all of the array
> elements to 0 then you set the diagonal to 1. I thought perhaps there
> was a slick method using implied do loops (or other constructs with
> forall, where, or fancy triplet descriptors) that would simply do an
> NxN set of operations.

Operation counts can be highly misleading. Some operations are orders of
magnitude worse than others. There are classic examples of attempts to
optimize code that end up making it slower instead because people have
done things like that.

Setting the array to zero can be *VERY* efficient - enough so that odds
are high that it is more efficient to just set the whole array to zero
than to add the extra code needed to special case the diagonal. It is
easy enough to do, for example

do j = 1 , n
do i = 1 , n
if (i==j) then
a(i,j) = 1
else
a(i,j) = 0
end if
end do
end do

but I'd bet good money that this is no more efficient, and likely less
so, than the first approach.

All that aside, one would usually expect that the resources required to
create an identity matrix like this would be negligable compared to
other things likely going on (such as actually using the matrix). One of
the other basics of optimization is to figure out what parts of the code
are important to work on optimizing; this doesn't seem likely to be a
good candidate.

For things like this, clarity is almost certainly more important than
speed. I highly recommend the "Elements of Programming Style" book that
Gordon mentioned. It was written long enough ago that the Fortran
examples are Fortran 66. The fact that the points made are still
relevant is a testimony to the book.

> BTW, in answer as to why I specified A(1:N,1:N) rather than A is that
> I am only using a portion of a larger matrix.

Makes sense.

Glen Herrmannsfeldt

unread,

Nov 14, 2008, 4:09:27 PM11/14/08

to

dance...@earthlink.net wrote:
(snip)

> I used a forall because I was under the impression that both forall
> and where allow more freedom in compiler optimization, particularly
> with multi-processor hardware. (plus I like how it reads compared to
> a do loop).

> Perhaps I am mistaken in this.?

FORALL has some strange restrictions, but it doesn't seem that
they help much for multi-processor hardware. They might help
for vector processors, but those are rare these days.
(Not counting SSE, though.) It seems to me not quite restrictive
enough for some uses, and not really better than the alternatives
for more ordinary uses.

-- glen

Richard Maine

unread,

Nov 14, 2008, 4:13:18 PM11/14/08

to

<dance...@earthlink.net> wrote:

> I used a forall because I was under the impression that both forall
> and where allow more freedom in compiler optimization, particularly
> with multi-processor hardware. (plus I like how it reads compared to
> a do loop).
>
> Perhaps I am mistaken in this.?

Forall was intended to facilitate optimization on multi-processor
systems. Unfortunately, it is pretty widely regarded as a failure in
that regard. Compilers had gotten pretty good at doing DO loops. In
practice, FORALL usually ends up no better than DO, and sometimes it
ends up a lot slower.

If you think it reads better, that's a much better reason, in my
opinion. As noted elsethread, performance is unlikely to be important in
this case. Legibility is.

dance...@earthlink.net

unread,

Nov 14, 2008, 5:04:28 PM11/14/08

to

On Nov 14, 1:06 pm, nos...@see.signature (Richard Maine) wrote:
>
> do j = 1 , n
> do i = 1 , n
> if (i==j) then
> a(i,j) = 1
> else
> a(i,j) = 0
> end if
> end do
> end do
>
> but I'd bet good money that this is no more efficient, and likely less
> so, than the first approach.
>
>

> --
> Richard Maine | Good judgment comes from experience;
> email: last name at domain . net | experience comes from bad judgment.
> domain: summertriangle | -- Mark Twain

Clearly the nested do loops that you show Richard include an IF test
therefore the number of operations >= 2*NxN. I do know about the
difference in operations (for example a multiplication is usually
longer than a addition/subtraction, a division even longer). A
logical expression (x == 1) is most likely longer than a simple
assignment (x = 1). One involves several assembly code commands
whereas the other is a simple poke. In all of this you certainly need
to count up the number of clock cycles to execute the compiled
assembly code.

Anyway, I here what you are saying. All in all I find it interesting
that the forall operation is generally poorly implimented.

Chris

Thomas Koenig

unread,

Nov 14, 2008, 5:08:48 PM11/14/08

to

On 2008-11-14, Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> do i=1,N
> do j=1,N
> a(j,i)=0
> enddo
> a(i,i)=1
> enddo

What about

do i=1,N
do j=1,N
a(j,i) = transfer(i.eq.j, 1)
end do
end do

?

(No, I'm not serious :-)

Glen Herrmannsfeldt

unread,

Nov 14, 2008, 5:16:00 PM11/14/08

to

dance...@earthlink.net wrote:

> On Nov 14, 1:06 pm, nos...@see.signature (Richard Maine) wrote:

>> do j = 1 , n
>> do i = 1 , n
>> if (i==j) then
>> a(i,j) = 1
>> else
>> a(i,j) = 0
>> end if
>> end do
>> end do

>>but I'd bet good money that this is no more efficient, and likely less
>>so, than the first approach.

(snip)

> Clearly the nested do loops that you show Richard include an IF test
> therefore the number of operations >= 2*NxN. I do know about the
> difference in operations (for example a multiplication is usually
> longer than a addition/subtraction, a division even longer).

That is the old rule, and still someone true, but less
so than in the old days. On most machines, the timing is
dominated by memory access, especially stores. This one
has better locality of reference on the store, though
it depends somewhat on how the compiler optimizes it.

do j = 1 , n
do i = 1 , n

t=0
if (i==j) t=1
a(i,j) = t
end do
end do

May (or may not) optimize better. If it can be done
with a conditional load, even better with 0, 1, and t
in registers, then it is probably the fastest.

I don't know if compilers will figure out how to use
a conditional load on Richard's version.

> A logical expression (x == 1) is most likely longer than a simple
> assignment (x = 1). One involves several assembly code commands
> whereas the other is a simple poke. In all of this you certainly need
> to count up the number of clock cycles to execute the compiled
> assembly code.

Conditional load and integer compare are pretty fast.
Store into cache is fast, too, but slow going to main memory.
Multiply has gotten faster over the years, and instruction overlap
makes any generalization on timing almost impossible.

> Anyway, I here what you are saying. All in all I find it interesting
> that the forall operation is generally poorly implimented.

-- glen

George

unread,

Nov 14, 2008, 5:19:11 PM11/14/08

to

A more roundabout way is to compile the slatec library, create a
non-singular square array, calculate its inverse and multiply:

c input a 3x3 matrix
c invert it
c display the inverse
c display products of original x inverse

c234567

real a(3,3),b(3,3),c(3,3)
integer ipvt(3)
real z(3), work(3)
real det(2)
character*1 transa, transb

lda = 3
n = 3

open(10,file='in.txt')
do i=1,3
read(10,*) (a(i,j),j=1,3)
end do

c b <- a
do i=1,3
do j=1,3
b(i,j)=a(i,j)
end do
end do

print *,'original'
print *,'a='
do i=1,3
write(*,*) (a(i,j),j=1,3)
end do

c factor matrix a using gaussian elimination
c recond = condition number (>0 == OK)

call sgeco (a, lda, n, ipvt, recond, z)
print *,'recond=',recond
print *,'ipvt=',ipvt

print *,'factored'
print *,'a='
do i=1,3
write(*,*) (a(i,j),j=1,3)
end do

job = 1+10 ! inverse & determinant
det(1) = 0
det(2) = 0

c calculate inverse -> a & determinant -> det = det(1) x 10^det(2)

call sgedi (a, lda, n, ipvt, det, work, job)
print *,'det=',det

print *,'inverse'
print *,'a='
do i=1,3
write(*,*) (a(i,j),j=1,3)
end do

transa = 'N' ! not trans or conj
transb = 'N' ! not trans or conj

m = 3 ! rows of a
n = 3 ! cols of b
k = 3 ! cols of a, rows of b
lda = 3 ! leading a
ldb = 3 ! leading b
ldc = 3 ! leading c
alpha = 1.0
beta = 1.0

c c<- 0

do i=1,3
do j=1,3
c(i,j) = 0
end do
end do

c c = alpha*A[] x B[] + beta*C

call sgemm (transa, transb, m, n, k, alpha, a, lda, b, ldb,
$ beta, c, ldc)

print *,'inverse * original'
print *,'c='
do i=1,3
write(*,*) (c(i,j),j=1,3)
end do

c c<- 0

do i=1,3
do j=1,3
c(i,j) = 0
end do
end do

c c = alpha*B[] x A[] + beta*C

call sgemm (transb, transa, m, n, k, alpha, b, ldb, a, lda,
$ beta, c, ldc)

print *,'original * inverse'
print *,'c='
do i=1,3
write(*,*) (c(i,j),j=1,3)
end do

end

C:\MinGW\source>g95 x.f -lslatec -o x.exe

C:\MinGW\source>x
original
a=
1. 2. 3.
0. 1. 4.
5. 6. 0.
recond= 0.002756242
ipvt= 3 2 3
factored
a=
5. 6. 0.
-0. 1. 4.
-0.2 -0.79999995 -0.19999981
det= 9.99999 -1.
inverse
a=
-24.000025 18.00002 5.0000052
20.00002 -15.000014 -4.000004
-5.000005 4.0000033 1.000001
inverse * original
c=
1.0000014 9.536743E-7 0.
0. 1. 0.
0. -4.7683716E-7 0.99999905
original * inverse
c=
0.99999905 4.7683716E-7 4.7683716E-7
0. 0.99999905 0.
-0.0000076293945 0.000005722046 1.0000038

C:\MinGW\source>

It's a longer process than the original, but you have more tools once you
get there.
--
George

The course of this conflict is not known, yet its outcome is certain.
Freedom and fear, justice and cruelty, have always been at war, and we know
that God is not neutral between them.
George W. Bush

Picture of the Day http://apod.nasa.gov/apod/

Glen Herrmannsfeldt

unread,

Nov 14, 2008, 5:21:20 PM11/14/08

to

Richard Maine wrote:
(snip)

> Setting the array to zero can be *VERY* efficient - enough so that odds
> are high that it is more efficient to just set the whole array to zero
> than to add the extra code needed to special case the diagonal.

Setting an array to zero might be very efficient, especially some
processors might have a very fast way to zero large blocks of memory.
Compilers aren't so good at figuring that out, though.

C has memset() which can be used to set a large block of memory
to (binary) zeros. That has a good chance of using any special
block move instructions that a processor might have. (If hand
written in assembler.)

Compilers might special case setting a whole array to zero.

-- glen

Glen Herrmannsfeldt

unread,

Nov 14, 2008, 5:38:43 PM11/14/08

to

Thomas Koenig wrote:
(snip)

> do i=1,N
> do j=1,N
> a(j,i) = transfer(i.eq.j, 1)
> end do
> end do

> (No, I'm not serious :-)

Is there a Fortran equivalent to C's

a[i][j]=(i==j)?1:0;

That is probably the most likely to be implemented
with conditional load. I think there is a Fortran
function similar to the conditional operator,
but I forget the name.

-- glen

Craig Powers

unread,

Nov 14, 2008, 5:46:14 PM11/14/08

to

That would be MERGE. If it's really implemented as a function, it
certainly would not produce optimal performance.

James Van Buskirk

unread,

Nov 14, 2008, 9:54:03 PM11/14/08

to

<dance...@earthlink.net> wrote in message
news:8c99ce04-ea0d-4e13...@q26g2000prq.googlegroups.com...

> Anybody have a slicker way of creating an identity matrix NxN?

> I am currently using:

> A(1:N,1:N) = 0
> forall (I = 1:N) A(I,I) = 1

Warning: untested!
A(1:N,1:N)=reshape([integer::],[N,N],reshape([1],[N+1],[0]))

OK, I couldn't resist testing:

C:\gfortran\clf\identity>type identity.f90
integer, parameter :: N = 3
integer A(N,N)

A(1:N,1:N)=reshape([integer::],[N,N],reshape([1],[N+1],[0]))
write(*,*) A
write(*,*) reshape([1],[N+1],[0])
end

C:\gfortran\clf\identity>gfortran identity.f90 -oidentity

C:\gfortran\clf\identity>identity
1 1 0 1635151433 1 1
0 1635151433 1
1 0 0 0

C:\gfortran\clf\identity>type identity.f90
integer, parameter :: N = 3
integer A(N,N)

A(1:N,1:N)=reshape([integer::],[N,N],reshape([1],[N+1],[0]))
write(*,*) A
!write(*,*) reshape([1],[N+1],[0])
end

C:\gfortran\clf\identity>gfortran identity.f90 -oidentity

C:\gfortran\clf\identity>identity
1 1635151433 1634541682 543453807 1 1635151433
163454
1682 543453807 1

C:\gfortran\clf\identity>gfortran -v
Built by Equation Solution (http://www.Equation.com).
Using built-in specs.
Target: x86_64-pc-mingw32
Configured with:
../gcc-4.4-20081017-mingw/configure --host=x86_64-pc-mingw32 --
build=x86_64-unknown-linux-gnu --target=x86_64-pc-mingw32 --prefix=/home/gfortra
n/gcc-home/binary/mingw32/native/x86_64/gcc/4.4-20081017 --with-gmp=/home/gfortr
an/gcc-home/binary/mingw32/native/x86_64/gmp --with-mpfr=/home/gfortran/gcc-home
/binary/mingw32/native/x86_64/mpfr --with-sysroot=/home/gfortran/gcc-home/binary
/mingw32/cross/x86_64/gcc/4.4-20081017 --with-gcc --with-gnu-ld --with-gnu-as
--
disable-shared --disable-nls --disable-tls --enable-libgomp --enable-languages=c
,fortran --enable-threads=win32 --disable-win32-registry
Thread model: win32
gcc version 4.4.0 20081017 (experimental) (GCC)

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

andy2O

unread,

Nov 15, 2008, 6:26:43 AM11/15/08

to

On Nov 15, 2:54 am, "James Van Buskirk" <not_va...@comcast.net> wrote:
> <dancerch...@earthlink.net> wrote in message

>
> news:8c99ce04-ea0d-4e13...@q26g2000prq.googlegroups.com...
>
> > Anybody have a slicker way of creating an identity matrix NxN?
> > I am currently using:
> > A(1:N,1:N) = 0
> > forall (I = 1:N) A(I,I) = 1
>

A lot of discussion about code, but I would be quite surprised if this
operation (generating an identity matrix) is the bottleneck in a real
algorithm!! Is this really the best code section to optimise if terms
of effort / return?

1) If N is large, then you should be using sparse, not dense matrices
in your algorithm.

2) If N is not large, and A is reset to be the identity *very* often,
then perhaps you do have a good reason, but generally improving the
overall algorithm will give better returns than spending time thinking
about a simple code section such as this.

Your original version is concise and clear, and perfectly sensible.
I'd stick with it (or use a DO loop instead of FORALL at most).

Regards,
andy

Ron Shepard

unread,

Nov 15, 2008, 11:09:49 AM11/15/08

to

In article
<ed1ce306-1816-4918...@f40g2000pri.googlegroups.com>,
andy2O <and...@hotmail.com> wrote:

> > > A(1:N,1:N) = 0
> > > forall (I = 1:N) A(I,I) = 1
> >

[...]

> Your original version is concise and clear, and perfectly sensible.
> I'd stick with it (or use a DO loop instead of FORALL at most).

The problem with FORALL in general is that it is not a looping
structure, it is another way of writing an array assignment. This
is not what programmers said that they needed in the 80's when the
semantics of the feature were being discussed, and the feature was
not added to f90 because of this disagreement between the programers
and the compiler writers. When it was included in f95, the compiler
writers won and the programmers did not get what they needed. This
distinction probably does not apply to the above code because the
right hand side expression is so simple, the optimization steps in
the compiler will probably do the right thing in the end anyway.
But in general, the semantics of the FORALL require the rhs to be
fully evaluated and stored in a temporary array, and then the
results are moved into the variable locations specified on the lhs
of the statement. This makes FORALL less useful and more prone to
inefficiencies than it should have been.

The most useful semantics, certainly in hindsight, was what fortran
programmers in the 80's recognized was important, namely a looping
structure in which the order of assignments could be rearranged by
the compiler in any way at all. This allows for efficient
vectorization on vector processors and efficient pipelining on RISC
processors, it allows for elimination of memory bank conflicts, it
allows for reduction of cache overflows and other NUMA
optimizations, it allows very general loop unrolling and
out-of-order execution, for parallel execution it allows for
asynchronous execution of separate threads, and it does not require
allocation of temporary vectors within innermost loops. The
semantics of FORALL prevent some or all of those things from
occurring.

If you do want to replace the FORALL in the above with a loop, then
something like

do i=1,n; A(i,i)=1; enddo

is almost as short as the FORALL and almost as easy to read by
humans. The problem with the do loop (although probably not an
issue in this particular case) is that the order of execution is
implied by the semantics, and that also inhibits various
optimizations from occurring.

$.02 -Ron Shepard

Glen Herrmannsfeldt

unread,

Nov 17, 2008, 4:19:13 PM11/17/08

to

Ron Shepard wrote:
(snip)

> The problem with FORALL in general is that it is not a looping
> structure, it is another way of writing an array assignment. This
> is not what programmers said that they needed in the 80's when the
> semantics of the feature were being discussed, and the feature was
> not added to f90 because of this disagreement between the programers
> and the compiler writers.

Yes. FORALL still requires all expressions on the right side to
be evaluated before any variable on the left side is changed.

FORALL doesn't restrict the order of evaluation on the right side,
or the order of assignment after the values have been determined.

> When it was included in f95, the compiler
> writers won and the programmers did not get what they needed. This
> distinction probably does not apply to the above code because the
> right hand side expression is so simple, the optimization steps in
> the compiler will probably do the right thing in the end anyway.

Yes. Well, I suppose most of the time there are no variables
used on both sides of the assignment. If the compiler can assume
no aliasing, then it can do the assignment in any order.

> But in general, the semantics of the FORALL require the rhs to be
> fully evaluated and stored in a temporary array, and then the
> results are moved into the variable locations specified on the lhs
> of the statement. This makes FORALL less useful and more prone to
> inefficiencies than it should have been.

Most of the time the programmer knows when changes on the left
can affect expressions on the right. FORALL doesn't leave any
way for the programmer to indicate that to the compiler.

> The most useful semantics, certainly in hindsight, was what fortran
> programmers in the 80's recognized was important, namely a looping
> structure in which the order of assignments could be rearranged by
> the compiler in any way at all. This allows for efficient
> vectorization on vector processors and efficient pipelining on RISC
> processors, it allows for elimination of memory bank conflicts, it
> allows for reduction of cache overflows and other NUMA
> optimizations, it allows very general loop unrolling and
> out-of-order execution, for parallel execution it allows for
> asynchronous execution of separate threads, and it does not require
> allocation of temporary vectors within innermost loops. The
> semantics of FORALL prevent some or all of those things from
> occurring.

Well, FORALL does allow the assignment in any order, but only
after all the expressions have been evaluated. Not quite
good enough for the cases listed above, though. Vector
processors are pretty much out of style now, though.
The other cases are still important.

-- glen

Ron Shepard

unread,

Nov 18, 2008, 12:02:21 AM11/18/08

to

In article <gfsn6j$nvc$1...@aioe.org>,
Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Vector
> processors are pretty much out of style now, though.
> The other cases are still important.

Why would you say that? All of the G4 (1998) and G5 (2003) PPC CPUs
and all of the x86 CPUs since the PIII (1999) have had vector
hardware. And now it is becoming possible to program the GPUs as
vector processors, so I would say that as far as general purpose
computing goes, vector processing is becoming more important
recently, not less.

$.02 -Ron Shepard

Glen Herrmannsfeldt

unread,

Nov 18, 2008, 2:14:15 AM11/18/08

to

Ron Shepard wrote:
(I wrote)

>> Vector processors are pretty much out of style now, though.
>> The other cases are still important.

> Why would you say that? All of the G4 (1998) and G5 (2003) PPC CPUs
> and all of the x86 CPUs since the PIII (1999) have had vector
> hardware. And now it is becoming possible to program the GPUs as
> vector processors, so I would say that as far as general purpose
> computing goes, vector processing is becoming more important
> recently, not less.

The Cray-1, a once popular vector machine, has eight vector
registers that each hold 64 elements of 64 bits each.
(I believe 64 bit is single precision floating point for the Cray-1.)

According to http://en.wikipedia.org/wiki/Cray-1 the Cray-1 has
about as many gates as an intel 386, so one might expect larger
vector registers in a more modern vector machine.

It seems that SSSE3 on the very new (much newer than PIII)
intel processors has 128 bit registers, or two 64 bit values.

http://en.wikipedia.org/wiki/SSSE3

Since even 64 elements is small for an array or matrix
column/row, even larger registers would be useful.
While two elements is technically a vector, I don't think
that is what Cray would have called it.

-- glen

George

unread,

Nov 18, 2008, 4:28:19 AM11/18/08

to

On Tue, 18 Nov 2008 00:14:15 -0700, Glen Herrmannsfeldt wrote:

> Since even 64 elements is small for an array or matrix
> column/row, even larger registers would be useful.
> While two elements is technically a vector, I don't think
> that is what Cray would have called it.

I bumped into a similar issue today when I queried slatec for a
determinant. The declaration was real det(2), which I would say is a
vector.

Oneway you can make a scalar into a vector is to write it in scientific
notation and have -68 be (-6.8, 1). I didn't realize my hunch was correct
until I muliplied (-68) by -1.4705881 -2, obtaining unity.

C:\MinGW\source>x
Hello World

1 5 0. 0.
2 6 NaN 1.060997895E-314
3 0 0. 0.
4 0 0. 0.
5 32 0. 0.
6 4
7 2
8 31
9 2147483647
10 2
11 24
12 -125
13 127
14 53
15 -1021
16 1023

1.00000 6.00000 5.00000
3.00000 4.00000 -3.00000
2.00000 2.00000 2.00000

-0.20588 0.02941 0.55882
0.17647 0.11765 -0.26471
0.02941 -0.14706 0.20588
-6.8 1.

1.00000 6.00000 5.00000
3.00000 4.00000 -3.00000
2.00000 2.00000 2.00000
-1.4705881 -2.

C:\MinGW\source>

Why would slatec give the determininat of the inverse?
--
George

The once all-powerful ruler of Iraq was found in a hole, and now sits in a
prison cell.

Ron Shepard

unread,

Nov 18, 2008, 11:24:07 AM11/18/08

to

In article <gftq2b$mo6$1...@aioe.org>,
Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Ron Shepard wrote:
> (I wrote)
>
> >> Vector processors are pretty much out of style now, though.
> >> The other cases are still important.
>
> > Why would you say that? All of the G4 (1998) and G5 (2003) PPC CPUs
> > and all of the x86 CPUs since the PIII (1999) have had vector
> > hardware. And now it is becoming possible to program the GPUs as
> > vector processors, so I would say that as far as general purpose
> > computing goes, vector processing is becoming more important
> > recently, not less.
>
> The Cray-1, a once popular vector machine, has eight vector
> registers that each hold 64 elements of 64 bits each.
> (I believe 64 bit is single precision floating point for the Cray-1.)
>
> According to http://en.wikipedia.org/wiki/Cray-1 the Cray-1 has
> about as many gates as an intel 386, so one might expect larger
> vector registers in a more modern vector machine.

First, the Cray cpus did not have on-chip cache. The functionality
of on-chip fast memory was instead performed in the register set
which had to be manually loaded.

Second, the SSE cpus have 8 new registers that are 128 bits wide.
That is enough to hold 16 64-bit floating point numbers (or 32
32-bit numbers), and that is enough to call a "vector" by any
definition.

Finally, just look at what the SSE instructions do. They do vector
operations and they speed up programs that do vector and matrix
arithmetic.

The Cray was certainly a leader in the field, but by the early to
mod 80's there were many cpu architectures that supported "vector"
operations through some combination of large register sets, special
fast memory, pipelined instructions, and parallel execution of the
floating point instruction units. This includes various VLIW and
RISC processors along with special vector hardware attached to more
generic processors. The SSE instructions were based in part on
intel's i860 RISC CPU from the late 80's, which performed vector and
matrix operations very efficiently through a combination of on-chip
cache and pipelined instruction set.

And with programmable GPUs performing vector operations now, and in
the near future, it still seems to me that vector processing is
becoming more important over time, not less.

$.02 -Ron Shepard

dance...@earthlink.net

unread,

Nov 18, 2008, 12:52:00 PM11/18/08

to

On Nov 18, 8:24 am, Ron Shepard <ron-shep...@NOSPAM.comcast.net>
wrote:
> In article <gftq2b$mo...@aioe.org>,

> Glen Herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>
>
>
>
> > Ron Shepard wrote:
> > (I wrote)
>
> > >> Vector processors are pretty much out of style now, though.
> > >> The other cases are still important.
>
> > > Why would you say that? All of the G4 (1998) and G5 (2003) PPC CPUs
> > > and all of the x86 CPUs since the PIII (1999) have had vector
> > > hardware. And now it is becoming possible to program the GPUs as
> > > vector processors, so I would say that as far as general purpose
> > > computing goes, vector processing is becoming more important
> > > recently, not less.
>
> > The Cray-1, a once popular vector machine, has eight vector
> > registers that each hold 64 elements of 64 bits each.
> > (I believe 64 bit is single precision floating point for the Cray-1.)
>

> > According tohttp://en.wikipedia.org/wiki/Cray-1the Cray-1 has

> $.02 -Ron Shepard- Hide quoted text -

>
> - Show quoted text -

I believe the Intel Core i7 has a return to SSE instructions (plus on
core cache and memory controller)

JB

unread,

Nov 20, 2008, 10:58:42 AM11/20/08

to

On 2008-11-18, Ron Shepard <ron-s...@NOSPAM.comcast.net> wrote:
> Second, the SSE cpus have 8 new registers that are 128 bits wide.
> That is enough to hold 16 64-bit floating point numbers (or 32
> 32-bit numbers), and that is enough to call a "vector" by any
> definition.

FWIW, in x86-64 mode there are 16 (128-bit) SSE registers.

> Finally, just look at what the SSE instructions do. They do vector
> operations and they speed up programs that do vector and matrix
> arithmetic.

They can do scalar operations as well. In fact, it's the standard way
of doing single and double precision floating point on most (I don't
know of a counter-example so if I'd be less careful I'd just say
'all') x86-64 ABI:s.

> And with programmable GPUs performing vector operations now, and in
> the near future, it still seems to me that vector processing is
> becoming more important over time, not less.

GPU:s implement something like the 'stream programming model'

http://en.wikipedia.org/wiki/Stream_processing

which is sort of an extension of vector processing. Basically, instead
of allowing primitive instructions on multiple elements, it has the
concept of user-provided 'kernel functions' that operate on each
element in an input stream (=array). I suppose this has a passing
resemblance to ELEMENTAL procedures in Fortran, but AFAIK there is
typically some support for (limited) scatter/gather access in stream
kernels.

Also see

http://merrimac.stanford.edu/publications/sc03_merrimac.pdf

The good news here is that a standard with most of the GPU chip
industry behind it is in the works:

http://en.wikipedia.org/wiki/OpenCL

http://www.hpcwire.com/blogs/OpenCL_On_the_Fast_Track_33608199.html

The bad news is that it's an extension of C99 rather than Fortran.

--
JB

Tobias Burnus

unread,

Nov 20, 2008, 11:49:57 AM11/20/08

to

On Nov 15, 3:54 am, "James Van Buskirk" <not_va...@comcast.net> wrote:
> C:\gfortran\clf\identity>type identity.f90

> A(1:N,1:N)=reshape([integer::],[N,N],reshape([1],[N+1],[0]))

[...]

> 1 1 0 1635151433 1 1
> 0 1635151433 1
> 1 0 0 0

Support for empty SOURCE was accidentally not implemented in gfortran;
this is fixed now:

1 0 0 0 1 0 0 0 1
1 0 0 0

Could you please, please report gfortran bugs either in bugzilla or
sent them to the gcc-fortran mailing list; I had almost missed this
"bug report", which would have been a pity.

We are trying hard to fix all reported bugs but without knowing about
a bug, we cannot fix it. Thus user feedback is really crucial. I think
the barrier is quite low (bugzilla, email list, where one does not
need to be subscribed to in order to send an email), but if anyone has
an idea how to reduce the barrier further, I'm happy to hear about it.

Tobias

James Van Buskirk

unread,

Nov 20, 2008, 4:28:23 PM11/20/08

to

"Tobias Burnus" <bur...@net-b.de> wrote in message
news:0de217e0-f735-4e65...@a12g2000yqm.googlegroups.com...

> Support for empty SOURCE was accidentally not implemented in gfortran;
> this is fixed now:

> 1 0 0 0 1 0 0 0 1
> 1 0 0 0

> Could you please, please report gfortran bugs either in bugzilla or
> sent them to the gcc-fortran mailing list; I had almost missed this
> "bug report", which would have been a pity.

> We are trying hard to fix all reported bugs but without knowing about
> a bug, we cannot fix it. Thus user feedback is really crucial. I think
> the barrier is quite low (bugzilla, email list, where one does not
> need to be subscribed to in order to send an email), but if anyone has
> an idea how to reduce the barrier further, I'm happy to hear about it.

Well, there are some issues with the normal channels for me. Thank
you for fixing this bug even so. One question I have is that when a
problem is detected with a specific usage, is it normal practice to
create test cases for all 3 classes of expressions? In this case
there may be some additional issues with initialization expressions.
See comments in code below:

C:\gfortran\clf\identity>type identity3.f90
module funcs
implicit none
contains
subroutine initialization()
integer, parameter :: N = selected_real_kind(15,300)
integer, parameter :: A(1:N,1:N) = &

reshape([integer::],[N,N],reshape([1],[N+1],[0]))

! The next two attempts failed:
! integer, parameter :: K = sum(A)
! integer, parameter :: K = N*A(2,2)+A(2,3)
integer, parameter :: K = N
real(K) x

x = 4*atan(real(1,kind(x)))
write(*,*) 'In initialization: x = ', x
end subroutine initialization

function specification(N)
integer N
character(sum( &
reshape([integer::],[N,N],reshape([1],[N+1],[0])) &
)) specification

specification = repeat('X',len(specification))
end function specification

subroutine ordinary(N)
integer N
integer A(N,N)
character(80) fmt

write(fmt,'(a,i0,a)') '(',N,'(i1:1x))'
A(1:N,1:N) = &

reshape([integer::],[N,N],reshape([1],[N+1],[0]))

write(*,'(a)') 'In ordinary: A ='
write(*,fmt) transpose(A)
end subroutine ordinary
end module funcs

program identity3
use funcs
implicit none
integer N

N = 3
call initialization()
write(*,'(a)') 'Results from specification:'
write(*,'(a,i0,a,i0)') &
'len(specification(',N,')) = ',len(specification(N))
write(*,'(a,i0,a,a)') &
'specification(',N,') = ',specification(N)
call ordinary(N)
end program identity3

C:\gfortran\clf\identity>gfortran -std=f2003 identity3.f90 -oidentity3

C:\gfortran\clf\identity>identity3
In initialization: x = 3.1415926535897931
Results from specification:
len(specification(3)) = 75
specification(3) =
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXX
In ordinary: A =
1 0 *
0 1 0
* 0 1

Now, when I leave the first commented line in force, I get:

...
! The next two attempts failed:
integer, parameter :: K = sum(A)
! integer, parameter :: K = N*A(2,2)+A(2,3)
! integer, parameter :: K = N
...

C:\gfortran\clf\identity>gfortran -std=f2003 identity3.f90 -oidentity3
identity3.f90:9.34:

integer, parameter :: K = sum(A)
1
Error: transformational intrinsic 'sum' at (1) is not permitted in an
initializa
tion expression
identity3.f90:12.15:

real(K) x
1
Error: Symbol 'k' at (1) has no IMPLICIT type
identity3.f90:14.10:

x = 4*atan(real(1,kind(x)))
1
Error: Symbol 'x' at (1) has no IMPLICIT type
identity3.f90:41.12:

use funcs
1
Fatal Error: Can't open module file 'funcs.mod' for reading at (1): No such
file
or directory
gfortran: Internal error: Aborted (program f951)
Please submit a full bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

And the second line yields:

...
! The next two attempts failed:
! integer, parameter :: K = sum(A)
integer, parameter :: K = N*A(2,2)+A(2,3)
! integer, parameter :: K = N
...

C:\gfortran\clf\identity>gfortran -std=f2003 identity3.f90 -oidentity3
identity3.f90:10.50:

integer, parameter :: K = N*A(2,2)+A(2,3)
1
Error: Initialization expression didn't reduce (1)
identity3.f90:12.15:

real(K) x
1
Error: Symbol 'k' at (1) has no IMPLICIT type
identity3.f90:14.10:

x = 4*atan(real(1,kind(x)))
1
Error: Symbol 'x' at (1) has no IMPLICIT type
identity3.f90:41.12:

use funcs
1
Fatal Error: Can't open module file 'funcs.mod' for reading at (1): No such
file
or directory
gfortran: Internal error: Aborted (program f951)
Please submit a full bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

The first line should pass or be marked with an error message that
simply says that this valid f03 initialization expression is not yet
implemented in gfortran. The second line may be due to RESHAPE
issues or something else.

In general any innocent-looking expression has 3 potential faces and
it is worthwhile to check to make sure that it works with all three
compilers (the initialization expression compiler, the specification
expression compiler, and the ordinary expression compiler).

Charles Coldwell

unread,

Nov 23, 2008, 9:50:15 PM11/23/08

to

Glen Herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> Thomas Koenig wrote:
> (snip)
>
>> do i=1,N
>> do j=1,N
>> a(j,i) = transfer(i.eq.j, 1)
>> end do
>> end do
>
>> (No, I'm not serious :-)
>
> Is there a Fortran equivalent to C's
>
> a[i][j]=(i==j)?1:0;

Actually, in C it would be just

a[i][j] = (i==j)

no need for the conditional.

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
GPG Key ID: 852E052F
GPG Key Fingerprint: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F

Glen Herrmannsfeldt

unread,

Nov 23, 2008, 10:52:21 PM11/23/08

to

Charles Coldwell wrote:
(snip, I wrote)

>>Is there a Fortran equivalent to C's

>> a[i][j]=(i==j)?1:0;

> Actually, in C it would be just

> a[i][j] = (i==j)

> no need for the conditional.

Yes, but I wanted the Fortran equivalent of the
conditional. I knew there was one, but didn't find it.

In the Fortran 2003 standard it is described under the
Array construction functions section, 13.5.13.
(Even though it doesn't require an array.)

I thought about the other form just after I posted,
though reasonably likely I would have used it in C code.

-- glen