Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Efficient matrix transpose in c
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  23 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
NM  
View profile  
 More options Dec 3 2002, 4:51 pm
Newsgroups: sci.math.num-analysis
From: "NM" <n...@nm.com>
Date: Tue, 3 Dec 2002 15:52:12 -0600
Local: Tues, Dec 3 2002 4:52 pm
Subject: Efficient matrix transpose in c
I need to transpose a matrix in C. The problem is that due to cache behavior
the performance is very poor. If anyone knows efficient matrix transpose
algorithm or has some code can you please help me?

Thanks

NM


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
wng  
View profile  
 More options Dec 3 2002, 6:18 pm
Newsgroups: sci.math.num-analysis
From: w...@uiuc.edu
Date: Tue, 03 Dec 2002 23:16:04 GMT
Local: Tues, Dec 3 2002 6:16 pm
Subject: Re: Efficient matrix transpose in c
You don't really need to store your transpose matrix.  If you need to
mulitply your transpose matrix with a vector, you can simply write a short
routine for doing that.  It's all a matter of indexing.

W.C.

Citing NM <n...@nm.com> from his previous post:
# I need to transpose a matrix in C. The problem is that due to cache behavior
# the performance is very poor. If anyone knows efficient matrix transpose
# algorithm or has some code can you please help me?

--


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
NM  
View profile  
 More options Dec 3 2002, 6:39 pm
Newsgroups: sci.math.num-analysis
From: "NM" <n...@nm.com>
Date: Tue, 3 Dec 2002 17:39:20 -0600
Local: Tues, Dec 3 2002 6:39 pm
Subject: Re: Efficient matrix transpose in c
I only need the transpose operation. What the next steps will do is not
important to me right now. I was wondering if there is any block algorithm
or something like that which will make use of the cache when transposing the
matrix. If there is a pointer to such algorithm I will be really grateful.

Thanks
NM

<w...@uiuc.edu> wrote in message

news:UWaH9.1486$Vf3.16125@vixen.cso.uiuc.edu...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pearu Peterson  
View profile  
 More options Dec 3 2002, 7:00 pm
Newsgroups: sci.math.num-analysis
From: Pearu Peterson <pe...@cens.ioc.ee>
Date: Wed, 4 Dec 2002 01:55:01 +0200
Local: Tues, Dec 3 2002 6:55 pm
Subject: Re: Efficient matrix transpose in c

On Tue, 3 Dec 2002, NM wrote:
> I need to transpose a matrix in C. The problem is that due to cache behavior
> the performance is very poor. If anyone knows efficient matrix transpose
> algorithm or has some code can you please help me?

Do you need in-place transpose?

I think an efficient as well a simple matrix transpose algorithm for
in-place transpose is to make element-wise copy of a matrix and then
memory-copy back to the original matrix. An obvious alternative, that is
swaping matrix elements in-place, is much slower.

This works nicely if the size of a matrix is, say, an order
of magnitude smaller than the size of the physical memory in a
computer. Otherwise, you should review your algorithm to avoid
explicit matrix transpose from the first place.

Pearu


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jive Dadson  
View profile  
 More options Dec 3 2002, 9:24 pm
Newsgroups: sci.math.num-analysis
From: Jive Dadson <dsdfdsad...@isdfssdfasdf.invalid>
Date: Tue, 03 Dec 2002 18:24:04 -0800
Local: Tues, Dec 3 2002 9:24 pm
Subject: Re: Efficient matrix transpose in c

w...@uiuc.edu wrote:

> You don't really need to store your transpose matrix.

That's certainly true if he is starting from scratch, but who would want to do that?  If he already has both C code and Fortran code that he needs to use, he may be stuck with actually moving data around.

I would suggest giving MTL (C++) a look.  I'm a bit turned off by the fact that it uses so-called "shallow copy" semantics, but it should prove educational in any case.

Jive


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George W. Bush  
View profile  
 More options Dec 3 2002, 10:40 pm
Newsgroups: sci.math.num-analysis
From: george.b...@whitehouse.com (George W. Bush)
Date: Wed, 04 Dec 2002 03:34:09 GMT
Local: Tues, Dec 3 2002 10:34 pm
Subject: Re: Efficient matrix transpose in c
You might look at TOMS 380

http://www.netlib.org/toms/380

The code is in Fortran but can be easily ported to another language.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Frank  
View profile  
 More options Dec 4 2002, 2:51 am
Newsgroups: sci.math.num-analysis
From: "Frank" <alvar...@comcast.net>
Date: Wed, 4 Dec 2002 02:51:14 -0500
Local: Wed, Dec 4 2002 2:51 am
Subject: Re: Efficient matrix transpose in c
Here is how I would transpose a 3x3 matrix in C++:

matrix33 &matrix33::transpose()
{
  float t;

  for (unsigned int c = 0; c < 3; c++)
  {
    for (unsigned int r = c + 1; r < 3; r++)
    {
      t = col[c][r];
      col[c][r] = col[r][c];
      col[r][c] = t;
    }
  }

  return *this;

}

hope that helps


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michel OLAGNON  
View profile  
 More options Dec 4 2002, 3:11 am
Newsgroups: sci.math.num-analysis
From: molag...@ifremer.fr (Michel OLAGNON)
Date: 4 Dec 2002 08:07:57 GMT
Local: Wed, Dec 4 2002 3:07 am
Subject: Re: Efficient matrix transpose in c

In article <asj91t$r7...@news.cs.utexas.edu>, "NM" <n...@nm.com> writes:
>I need to transpose a matrix in C. The problem is that due to cache behavior
>the performance is very poor. If anyone knows efficient matrix transpose
>algorithm or has some code can you please help me?

How often ? What are the relative sizes of the matrix, cache and overall
memory ? Is the matrix square or rectangular ? Are the dimensions fixed,
or dependent on the run ?

Michel


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Deskevich  
View profile  
 More options Dec 4 2002, 11:06 am
Newsgroups: sci.math.num-analysis
From: mikedeskev...@yahoo.com (Mike Deskevich)
Date: 4 Dec 2002 08:06:49 -0800
Local: Wed, Dec 4 2002 11:06 am
Subject: Re: Efficient matrix transpose in c
if the cache is your only issue, then i can see no way to speed up the
algorithm.  if you don't need to do the transpose in place, then
you're going to be doing something like this

loop over indices by column in matrix 1
loop over indices by row in matrix 1
take value from (column,row) in matrix 1 and put it in (row,column) in
matrix 2
end loop row
end loop column

using c matrices as an example, the last index is the one that changes
most rapidly, thus it's the one close in cache.  so to optimize the
use of the cache you'd want to loop so that the row loop is the inner
loop.  however during the transpose, the rows will become the columns
in matrix 2 so that'll screw up the caching.  if  you go the other way
and have the rows in matrix 2 be the inner loop, then matrix 1 will
screw up your caching.  i know of no way around this.

if you have to do the transpose inplace then you'll be doing a bunch
of swapping of elements that'll be far apart in memory and there's no
hope of being friendly to the cache.

mike


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Han-Wen Nienhuys  
View profile  
 More options Dec 4 2002, 12:13 pm
Newsgroups: sci.math.num-analysis
From: han...@meddo.cs.uu.nl (Han-Wen Nienhuys)
Date: 4 Dec 2002 17:07:51 GMT
Local: Wed, Dec 4 2002 12:07 pm
Subject: Re: Efficient matrix transpose in c
In article <71734b21.0212040806.d758...@posting.google.com>,

Mike Deskevich <mikedeskev...@yahoo.com> wrote:
>if the cache is your only issue, then i can see no way to speed up the
>algorithm.  if you don't need to do the transpose in place, then
>you're going to be doing something like this
>in matrix 2 so that'll screw up the caching.  if  you go the other way
>and have the rows in matrix 2 be the inner loop, then matrix 1 will
>screw up your caching.  i know of no way around this.

>if you have to do the transpose inplace then you'll be doing a bunch
>of swapping of elements that'll be far apart in memory and there's no
>hope of being friendly to the cache.

If you're really paranoid about efficiency, you could try to use
prefetch instructions to lessen cache impact. These instructions hint
the CPU that certain pieces of data will be needed in the future. The
CPU can respond by loading them into cache in advance.

Newer versions of GCC support this instruction on some targets.  I did
some experimenting with prefetching on a PIII (gcc 3.2), but I was
never able to gain significant performance increase for vectorized
operations, though.

--

Han-Wen Nienhuys   |   han...@cs.uu.nl    | http://www.cs.uu.nl/~hanwen/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ron Shepard  
View profile  
 More options Dec 4 2002, 1:02 pm
Newsgroups: sci.math.num-analysis
From: Ron Shepard <ron-shep...@NOSPAM.attbi.com>
Date: Wed, 04 Dec 2002 18:02:13 GMT
Local: Wed, Dec 4 2002 1:02 pm
Subject: Re: Efficient matrix transpose in c
In article <71734b21.0212040806.d758...@posting.google.com>,
 mikedeskev...@yahoo.com (Mike Deskevich) wrote:

> if you have to do the transpose inplace then you'll be doing a bunch
> of swapping of elements that'll be far apart in memory and there's no
> hope of being friendly to the cache.

Transposing in-place by sub-blocks will eliminate some of the cache
misses.

The simple approach only works in-place for square matrices, of
course; the in-place rectangular case is more complicated (that is
what the TOMS algorithm does).

$.02 -Ron Shepard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Rudd  
View profile  
 More options Dec 4 2002, 1:57 pm
Newsgroups: sci.math.num-analysis
From: Eric Rudd <r...@cyberoptics.com>
Date: Wed, 04 Dec 2002 12:57:09 -0600
Local: Wed, Dec 4 2002 1:57 pm
Subject: Re: Efficient matrix transpose in c

NM wrote:
> I need to transpose a matrix in C. The problem is that due to cache behavior
> the performance is very poor. If anyone knows efficient matrix transpose
> algorithm or has some code can you please help me?

Have a look at

     Michael R. Portnoff, "An Efficient Parallel-Processing Method for
     Transposing Large Matrices in Place," IEEE Trans. Image Proc. vol. 8
     (1999 September) 1265-1275.

I haven't used the algorithm, but it looks appropriate for what you're doing.
Also, some FFT algorithms involve transposing arrays, so you may wish to survey
that literature.  In particular, there is a detailed discussion of
high-performance transposing in the following book:

   http://www.ec-securehost.com/SIAM/FR10.html

Hope this helps.

-Eric Rudd


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Brown  
View profile  
 More options Dec 5 2002, 4:44 am
Newsgroups: sci.math.num-analysis
From: Martin Brown <martin.br...@pandora.be>
Date: Thu, 05 Dec 2002 09:44:57 GMT
Local: Thurs, Dec 5 2002 4:44 am
Subject: Re: Efficient matrix transpose in c

There are a whole bunch of algorithms that aim to minimise the number of cache
misses by tranposing carefully chosen sizes of local sub blocks that will just fit
into cache and then shuffling the blocks to their final destination. Most of the
old large scale 2D FFT codes used something similar in the '80 s to get acceptable
performance with large arrays under very tight physical memory constraints.

Naive array transpose of a large 2D image array used to sometimes grind demand
paged machines of that era to a complete standstill. Modern kit has a lot more
memory to play with and often 2 levels of cache.

Regards,
Martin Brown


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jentje Goslinga  
View profile  
 More options Dec 5 2002, 8:17 am
Newsgroups: sci.math.num-analysis
From: Jentje Goslinga <gosli...@shaw.ca>
Date: Thu, 05 Dec 2002 12:58:18 GMT
Local: Thurs, Dec 5 2002 7:58 am
Subject: Re: Efficient matrix transpose in c
Hello NM,

There was a C code posted for this in 1998 by
Jason Stratos Papadopoulos in this newsgroup.
I remember it being quite fast.

Jentje Goslinga


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
NM  
View profile  
 More options Dec 5 2002, 10:51 am
Newsgroups: sci.math.num-analysis
From: "NM" <n...@nm.com>
Date: Thu, 5 Dec 2002 09:51:16 -0600
Local: Thurs, Dec 5 2002 10:51 am
Subject: Re: Efficient matrix transpose in c
Transpose is taking too much time. For example a 4096 x 4096 matrix
containing complex numbers ( 2 double) is taking about 10 second whereas
copying takes about 1.6 sec. It is currently a bottleneck of what i am
doing. I don't know of the cache size, overall memory is 512MB. The matrix
can be rectangular. dimensions are dependent in the run.

NM

"Michel OLAGNON" <molag...@ifremer.fr> wrote in message

news:askd4t$o90$1@iletudy.ifremer.fr...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
NM  
View profile  
 More options Dec 5 2002, 11:02 am
Newsgroups: sci.math.num-analysis
From: "NM" <n...@nm.com>
Date: Thu, 5 Dec 2002 10:02:40 -0600
Local: Thurs, Dec 5 2002 11:02 am
Subject: Re: Efficient matrix transpose in c
Thanks

NM

"Jentje Goslinga" <gosli...@shaw.ca> wrote in message

news:3DEF4D75.2060208@shaw.ca...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pearu Peterson  
View profile  
 More options Dec 5 2002, 12:54 pm
Newsgroups: sci.math.num-analysis
From: Pearu Peterson <pe...@cens.ioc.ee>
Date: Thu, 5 Dec 2002 19:53:51 +0200
Local: Thurs, Dec 5 2002 12:53 pm
Subject: Re: Efficient matrix transpose in c

On Thu, 5 Dec 2002, NM wrote:
> Transpose is taking too much time. For example a 4096 x 4096 matrix
> containing complex numbers ( 2 double) is taking about 10 second whereas
> copying takes about 1.6 sec. It is currently a bottleneck of what i am
> doing. I don't know of the cache size, overall memory is 512MB. The matrix
> can be rectangular. dimensions are dependent in the run.

Indeed, transposing large matrices in-place by swaping its elements takes
more time (but less memory) when compared to a transpose algorithm
consisting of element-wise copying. Here are few test results (on PII
400MHz, 196MB RAM):

transposing 2048x2048 matrix
----------------------------
$ time ./main 1 2048 10                
size=2048, repeat=10 transpose=transpose_inplace_swap

real    0m6.277s
user    0m5.860s
sys     0m0.410s

$ time ./main 2 2048 10
size=2048, repeat=10 transpose=transpose_inplace_copy

real    0m3.578s
user    0m2.470s
sys     0m1.100s

$ time ./main 3 2048 10
size=2048, repeat=10 transpose=transpose_inplace_copy_cache

real    0m2.838s
user    0m2.460s
sys     0m0.360s

transposing 512x512 matrix
--------------------------
$ time ./main 1 512 50
size=512, repeat=50 transpose=transpose_inplace_swap

real    0m1.874s
user    0m1.820s
sys     0m0.040s

$ time ./main 2 512 50
size=512, repeat=50 transpose=transpose_inplace_copy

real    0m2.026s
user    0m1.170s
sys     0m0.840s

$ time ./main 3 512 50
size=512, repeat=50 transpose=transpose_inplace_copy_cache

real    0m1.180s
user    0m1.140s
sys     0m0.040s

Note that for smaller matrices the difference between above methods are
smaller. transpose_inplace_swap becomes more efficient than
transpose_inplace_copy_cache if the size of a matrix is less that 200-250.

When increasing the size of a matrix, transpose_inplace_copy_cache becomes
more and more efficent than transpose_inplace_swap until physical memory
limit is hit.

Source code for ./main can be found in
  http://cens.ioc.ee/~pearu/misc/transpose/

Regards,
        Pearu


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bill  
View profile  
 More options Dec 5 2002, 1:20 pm
Newsgroups: sci.math.num-analysis
From: "Bill" <pe...@cminet.net>
Date: Thu, 5 Dec 2002 10:54:39 -0500
Local: Thurs, Dec 5 2002 10:54 am
Subject: Re: Efficient matrix transpose in c
Hi NM

  As others have pointed out there is no fast way
to do a transpose in place for you have to individuslly
access A[i][j] for all ij. You need to scan the matrix A
across the columns in the row major direction ( I assume
you have stored it this way) and then store each row
in the columns of a new matrix B.
   This is easy if you are using a modern C++ linear
algebra package with row and column iterators
and have the standard library. Its
just a few lines

 Matrix     transpose(const Matrix& m) {
            Matrix result(m.cols(),m.rows());
            for (int i=0; i<m.rows(); i++)

std::copy(m.row_begin(i),m.row_end(i),result.column_begin(i));
            return result;
         }

If you have to use the older C++ libraries with pointers it
looks something like  this snippet

typedef float T;  // could also be double, complex etc

Matrix     transpose(const Matrix& m)
 {
 Matrix result(m.cols(),m.rows());   // the new matrix

 T * mp   =  m.Data() ; // m element pointer
 T * rep  =  result.Data() ; // result element pointer
 T * rowp = m.Data() ; //  row end pointer

            for (int i=0; i<m.rows(); i++)
   {
    rep = result.Data() +i ; // set column pointer to top of column
    rowp =  += m.rows();  // set pointer to end of row

     while(mp < rowp){
     *rep = *mp;
     mp++;
     rep += m.cols();
     }

            return result;
         }

regards... Bill Shortall


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
E. Robert Tisdale  
View profile  
 More options Dec 5 2002, 1:57 pm
Newsgroups: sci.math.num-analysis
From: "E. Robert Tisdale" <E.Robert.Tisd...@jpl.nasa.gov>
Date: Thu, 05 Dec 2002 13:35:27 -0500
Local: Thurs, Dec 5 2002 1:35 pm
Subject: Re: Efficient matrix transpose in c
Something that calls itself NM wrote:

> I need to transpose a matrix in C.

The C computer programming language
does not have have a matrix type.
You probably meant a two dimensional array of numbers.

> The problem is that,
> due to cache behavior, the performance is very poor.
> If anyone knows efficient matrix transpose algorithm
> or has some code can you please help me?

> I only need the transpose operation.
> What the next steps will do is not important to me right now.
> I was wondering if there is any block algorithm
> or something like that which will make use of the cache
> when transposing the matrix.
> If there is a pointer to such algorithm I will be really grateful.

Block algorithms are only useful when they "reuse"
elements of a two dimensional array that have already been cache'd.
The only thing that you can do that might enhance performance of

    B = A^T

is to avoid striding across the rows of the source array A

    const    int m = 4096;
    const    int n = 4096;
    double    A[m][n], B[n][m];
    for (int i = 0; i < m; ++i)
        for (int j = 0; j < n; ++j)
            B[j][i] = A[i][j];

Because n is small, B[j][i+1] = A[i+1][j]
will still be in the cache if B[j][i] = A[i][j]
caused the processor to cache both
B[j][i] and B[j][i+1] in the same cache block.

> Transpose is taking too much time.
> For example a 4096 x 4096 matrix containing complex numbers
> ( 2 double) is taking about 10 second
> whereas copying takes about 1.6 sec.
> It is currently a bottleneck of what I am doing.
> I don't know of the cache size, overall memory is 512MB.
> The matrix can be rectangular. dimensions are dependent in the run.

The most probable explanation for this time difference
is that one or both of your arrays are not aligned
on quad word boundaries so that retrieving an element of A
or storing an element of B requires two memory references
instead of just one (assuming that your cache block size is 128 bits).
Try something like this

    double alignment_padding;
    doubleComplex A[m][n], B[n][m];

and, if you're lucky, you will get the complex elements
of both A and B aligned on quad word boundaries.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bv  
View profile  
 More options Dec 5 2002, 7:52 pm
Newsgroups: sci.math.num-analysis
From: bv <b...@Xsdynamix.com>
Date: Thu, 05 Dec 2002 16:50:05 -0800
Local: Thurs, Dec 5 2002 7:50 pm
Subject: Re: Efficient matrix transpose in c

NM wrote:

> Transpose is taking too much time. For example a 4096 x 4096 matrix
> containing complex numbers ( 2 double) is taking about 10 second whereas
> copying takes about 1.6 sec. It is currently a bottleneck of what i am
> doing. I don't know of the cache size, overall memory is 512MB. The matrix
> can be rectangular. dimensions are dependent in the run.

I'm not familiar with these things in C so the following may not apply.
In Fortran,
it makes a big difference if matrix dimensions are the power of two or
not. i.e. changing size to 4097 x 4097 will markedly reduce the
execution of the algorithm.

--
Dr.B.Voh
------------------------------------------------------
Applied Algorithms    http://sdynamix.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
bv  
View profile  
 More options Dec 5 2002, 7:59 pm
Newsgroups: sci.math.num-analysis
From: bv <b...@Xsdynamix.com>
Date: Thu, 05 Dec 2002 16:56:34 -0800
Local: Thurs, Dec 5 2002 7:56 pm
Subject: Re: Efficient matrix transpose in c

NM wrote:

> Transpose is taking too much time. For example a 4096 x 4096 matrix
> containing complex numbers ( 2 double) is taking about 10 second whereas
> copying takes about 1.6 sec. It is currently a bottleneck of what i am
> doing. I don't know of the cache size, overall memory is 512MB. The matrix
> can be rectangular. dimensions are dependent in the run.

I'm not familiar with these things in C so the following may not apply.
In Fortran, it makes a big difference if matrix dimensions are the power
of two or not. i.e. changing size to 4097 x 4097 will significantly
reduce the execution of the algorithm.

--
Dr.B.Voh
------------------------------------------------------
Applied Algorithms    http://sdynamix.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
NM  
View profile  
 More options Dec 16 2002, 4:41 pm
Newsgroups: sci.math.num-analysis
From: "NM" <n...@nm.com>
Date: Mon, 16 Dec 2002 15:41:40 -0600
Local: Mon, Dec 16 2002 4:41 pm
Subject: Re: Efficient matrix transpose in c
I haven't looked many days for any response, as i have implemented a
blocking algorithm. Then I found your result here and I was amazed with your
result, as it was very good and so simple. BUT hey your code is wrong. In
the last 2 transpose algorithm it seems from indentation that there are 2
nested loop. But in fact there is none. The loop over j runs only for the
last value of i which is already graeter than n. So you haven't done any
transposition.

NM

"Pearu Peterson" <pe...@cens.ioc.ee> wrote in message

news:Pine.LNX.4.21.0212051933360.4929-100000@cens.kybi...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pearu Peterson  
View profile  
 More options Dec 18 2002, 6:32 pm
Newsgroups: sci.math.num-analysis
From: Pearu Peterson <pe...@cens.ioc.ee>
Date: Thu, 19 Dec 2002 01:32:07 +0200
Local: Wed, Dec 18 2002 6:32 pm
Subject: Re: Efficient matrix transpose in c

On Mon, 16 Dec 2002, NM wrote:
> I haven't looked many days for any response, as i have implemented a
> blocking algorithm. Then I found your result here and I was amazed with your
> result, as it was very good and so simple. BUT hey your code is wrong.

Oops. You are right and the fix reversed my conclusions. Sorry for the
confusion.

However, after fixing the code, I find the following result interesting:

$ time ./main 1 512 50
size=512, repeat=50 transpose=transpose_inplace_swap

real    0m2.210s
user    0m2.160s
sys     0m0.040s
$ time ./main 1 513 50
size=513, repeat=50 transpose=transpose_inplace_swap

real    0m1.148s
user    0m1.100s
sys     0m0.030s

That is, by changing the size of a matrix from 512 to 513, almost doubles
the efficiency.

Pearu


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »