I remember having that same concern back when I started using f90. I
stumbled across the following. If you have a subprogram with an explicit
interface with an intent(in) assumed shape matrix argument, then you can
use the loc() or c_loc() functions to look at the memory locations of
the actual and dummy arguments. What I found was that the transpose(A)
actual argument and the dummy argument memory locations are the same.
This is not required by the standard (any of them), but it was a nice
surprise to see that optimization being performed. I have looked at
several compilers over the years, and this seems to be a common
optimization.
What is happening, of course, is that the compiler is creating a new
data structure for the transposed array, and the row and column
addressing values for increments and offsets for the elements are being
switched, but the actual memory is the same for the actual and the dummy
arrays. Thus it is like a "shallow transpose" is being done, similar to
a shallow copy using pointer addresses rather than a deep copy that
actually references the memory. Thus, no matter how large the array
argument is, the calling sequence takes O(1) time. It is just the
metadata that is modified in the call, not the actual array data.
At the times I have looked at this, it did not also work for other
declarations. Explicit shape dummy arguments don't work that way because
they must be contiguous, thus a deep transpose must be done, with
copy-in/copy-out. Intent(inout) dummy arguments do not work this way
because the actual argument is considered to be an expression. However,
if you think about it, the same shallow transpose operation would also
work in this case too. For this reason, I have suggested that there
should be a new shallow transpose operator that is directly available to
the programmer that works the same way as this common optimization, but
one that would not be considered as an expression. It's only task is to
manipulate the metadata for arrays. It would always be an O(1)
operation, no matter the size of the array. In addition to actual
arguments, it could also be used in expressions and on the left hand
side of expressions, the same way that the transpose operation can be
used in mathematical expressions. There is currently no way to do a
pointer assignemnt to the transpose of an array -- this would allow that
in a straightforward way. The programmer could do things like
B => shallow_transpose(A)
shallow_transpose(B) => A
B = expression...shallow_transpose(A)...expression
B = function(,..., shallow_transpose(A), ...)
call some_subroutine(..., shallow_transpose(A), ...)
and know that all of those transpose operations would be done with O(1)
effort. Currently, it is up to the compiler to decide which transpose
operations are shallow (and cheap) and which ones are deep (and
expensive). This would give the programmer more control over the
low-level operations that are being performed.
BTW, I do not think there is any need to introduce an intrinsic MATRIX
type in fortran. That would introduce much complication to the language,
requiring, for example, separate subprograms to be written for various
combinations of array and MATRIX arguments, new conversion routines to
be defined, new semantics for assignments, pointer assignments, actual
and dummy arguments, and mixed-mode arithmetic, and so on. Consider for
example, the nightmare of maintaining the LAPACK library that allows all
combinations of array and MATRIX arguments. Instead, I think exposing
the array metadata to the programmer, even if it is limited to 2D arrays
in some cases, would be generally more productive.
$.02 -Ron Shepard