Hi Ilkhom,
There's no need to retile A. In fact, I'm not exactly sure what you mean by "retile".
Are you using square tiles for A, B, and C (aside from rectangular tiles on the right and bottom edges)? It's possible to use rectangular tiles, but that complicates lots of things — the tiles of A, B, and C have to be conformant in order to multiply them. The GPU implementation is more complicated and less efficient if they aren't square.
Be aware that SLATE does shallow copies, so after this:
A = C;
A and C refer to exactly the same underlying object. If you modify C, then A reflects the same changes. This will not do what you want:
slate::multiply( A, B, C );
A = C; // Not what you want — A and C now refer to the same object.
}
On the 2nd and subsequent iterations, it would effectively be doing:
slate::multiply( C, B, C );
Instead, you may want to swap their contents (which just swaps pointers and metadata, so it's cheap):
slate::multiply( A, B, C );
swap( A, C );
}
I'm inferring that A and C are the same size, m-by-n, and B is n-by-n. Possibly m = n, so they're all square, but that's not required by your statement.
Mark
Interim Director, Innovative Computing Laboratory (ICL)
Research Assistant Professor, University of Tennessee, Knoxville