Thanks,
===============================================================
Gregory V. Wilson || Computing and Information Science
gwi...@cs.uoregon.edu || University of Oregon
===============================================================
[1] John T. Feo, David C. Cann, and Rodney R. Oldehoeft: "A Report on the
Sisal Language Project." Journal of Parallel and Distributed Computing,
10, 349-366, 1990.
> What are the prospects for support for true multi-dimensional
> arrays to be included in the next revision of the Report?
I hope the prospects are nil. Why can't this be included as part
of a numerical analysis library?
> Members of the SISAL project have stated that "Implementing
> multidimensional arrays as arrays of arrays was our greatest
> single mistake" [1].
Without finding and reading the article, I have no idea why they
said this. Could you give us a brief summary?
> [M]uch of the numerical work I'm doing or planning to do would
> be simpler to code, and easier to vectorise, if these were
> supported.
I don't understand what language features you want that can't be
supported on top of R4RS, particularly in light of the addition
of define-syntax.
>[1] John T. Feo, David C. Cann, and Rodney R. Oldehoeft: "A Report on the
> Sisal Language Project." Journal of Parallel and Distributed Computing,
> 10, 349-366, 1990.
--
--johnl
Regular arrays are such common structures in numerical codes
(99.random-value per cent of all structures) that making them fast is
important (to people like me, anyway). Of course, everything could be
supported on top of R4RS; however, it would (almost certainly) run faster if
implemented intrinsically, if MDA's were guaranteed to be laid out
contiguously in memory, etc.
SISAL presently implements MDA's as vectors of vectors (of vectors...).
While this allows for ragged arrays, Feo et al point out in their article
that few numerical programs use this facility. On the other hand, if MDAs
had been intrinsic, and their layout in memory fixed, it would have been
possible for the compiler to do much more in the way of vectorisation.
Cheers,
Perhaps for the same reason that 1-dimensional arrays are not relegated to a
numerical analysis library: so that compilers, knowing about them, can optimize
codes that use them.
> > Members of the SISAL project have stated that "Implementing
> > multidimensional arrays as arrays of arrays was our greatest
> > single mistake" [1].
>
> Without finding and reading the article, I have no idea why they
> said this. Could you give us a brief summary?
>
Many array optimizations take advantage of knowledge about 2d-array layout
(similarly for 3-d, ...). These optimizations are often crucial because they are in
the most frequently executed code (inner loops).
For example, suppose 2-d arrays were laid out in row-major order (*), i.e.,
rightmost index varying fastest, like an odometer. Given a loop:
do i ...
... A[i,j] ...
... B[j,i] ...
good Fortran compilers can lift out A+j and B+NJ (where N is the size of each row)
as loop invariants, essentially turning both accesses into a pointer scans,
vertically down column i of A with stride N, and horizontally across row j of B with
stride 1. With arrays of arrays, you can't do the former: you pay N extra memory
accesses.
Similarly, consider:
do i ...
do j ...
... A[i,j] ... A[i-1,j] ...
A good Fortran compiler can keep a single pointer in a register for both loads (one
direct, one at offset N). With arrays of arrays, you need to keep two separate
pointers, one for row i and one for row i-1.
Finally, many, many optimizations (vectorization, pipeline scheduling, loop
transformations, ...) rely on accurate (non-) aliasing information, derived from
dependence analysis. With two dimensional arrays, the compiler knows trivially, for
example, that A[i-1,j] and A[i,j] are distinct locations (so that a store to one
does not constrain scheduling of a load to the other). With arrays of arrays, the
compiler has the difficult job of having to PROVE that A[i-1][j] and A[i][j] do not
alias.
Nikhil (nik...@crl.dec.com)
(*) I know, I know, Fortran lays out 2-d arrays in column major order ...
and in response to my query:
> Of course, everything could be supported on top of R4RS;
> however, it would (almost certainly) run faster if implemented
> intrinsically, if MDA's were guaranteed to be laid out
> contiguously in memory, etc.
> [....]
> On the other hand, if MDAs had been intrinsic, and their layout
> in memory fixed, it would have been possible for the compiler
> to do much more in the way of vectorisation.
I carefully read through the relevant portions of R4RS to be sure
my memory was correct, and it was: R4RS makes no claim and states
no requirement that a vector be allocated as a single contiguous
slab of memory. In fact, about all it has to say on the matter
is that
A vector typically occupies less space than a list
of the same length, and the average time required
to access a randomly chosen element is typically
less for the vector than for the list.
as well as providing essential procedures that provide random
access.
On the other hand, Dylan both has multi-dimensional arrays and
requires them to have constant access time. I don't think that
allows for anything but contiguous allocation, but strange things
can happen on parallel machines, and I make no promises.
I hope this doesn't start another flame war, but I don't want
Scheme extended in the name of efficiency. In the name of
expressiveness, yes. (As an aside, I'm all for the core language
with essential libraries that can, but are not necessarily
implemented using the core language.)
--
--johnl
Hm, do you mean nil, NIL, (), or #F?
But you're right, if we blindly go ahead and include such untested and
stupid things as the oldest data structure in computing, people are
going to start wondering about our sanity. I say, let's keep up our
perfect record of including only tried and true features like CALL/CC.
(I admit some might wonder why we included something no one had ever
heard of before and was not part of any language prior to Scheme, but
those idiots just haven't combed Plato's heaven the way we have. A
more reasonable proposal in my
EXPLORE-THE-RIGHT-NONDETERMINISTIC-BRANCH function, for which I'll
show a log(n) implementation in the next POPL.)
-rpg-
There is a basic tension here between small, elegant languages
and large, efficient languages. The reason for efficient
languages large is that typically the price for efficiency is
specificity, so in order to have a general purpose language, the
number of constructs must grow.
I think the answer for Scheme is a small core language, smaller
than R4RS, but certainly including macros, together with a
standard library of routines that *could* be implemented using
the core language, but need not be. (I think this is under
consideration for R5RS in any case. Something similar is, at
least.)
One benefit is that you can get a complete compiler by
implementing the core language, once the standard library is
written once using the core. You can also have, say, numerical
analysis Scheme compilers, where the relevant operations are
heavily optimized, perhaps at the cost of some others. But this
would be an issue of the compiler, not the language.
An implication of this approach to the issue at hand is that the
printed representation would be a vector of vectors, but the
compiler would see multidimensional array primitives, with all
the optimization benefits of that.
In this way, we separate expressiveness from usefulness, which
seems to be a nice distinction to make.
--
--johnl
Hmmm... so what happens if I'm a Scheme, and I read in a vector of vectors?
Do I:
a) assume that I'm to create a multi-dimensional array (in which case,
code which later replaces one of the sub-vectors with a sub-vector
of a different size fails); or
b) create a vector of vectors (in which case I can't make any assumptions
about strides through memory, and can't vectorise, or take advantage of
other pipelining features of my hardware).
> I think the answer for Scheme is a small core language, smaller
> than R4RS, but certainly including macros, together with a
> standard library of routines that *could* be implemented using
> the core language, but need not be. (I think this is under
> consideration for R5RS in any case. Something similar is, at
> least.)
...
> An implication of this approach to the issue at hand is that the
> printed representation would be a vector of vectors, but the
> compiler would see multidimensional array primitives, with all
> the optimization benefits of that.
This approach (which I think is a good one) says nothing about the
printed representation. The library could provide functions to
manipulate multidimensional arrays without guaranteeing anything at
all about their printed representation or their representation.
david carlton
car...@husc.harvard.edu
Laundry is the fifth dimension!! ...um...um... th' washing
machine is a black hole and the pink socks are bus drivers who
just fell in!!
| The reason I joined this discussion is because I have
| some concerns about language design, and Scheme in particular,
| and not because I have some drum to beat.
I think you've just contradicted yourself.
| In this way, we separate expressiveness from usefulness, which
| seems to be a nice distinction to make.
What is expressiveness if it is not the ability to express what the
user wants done?
You are saying, I care about cwcc and who knows what else because I
don't want to write my own interpreter/compiler. Why not? Because it
is a lot of work to get one right and working efficiently.
Other people want arrays because they don't want to write their own.
Why not? Because it is a lot of work to get them right and working
efficiently.
What's the difference?
Don't be snobbish here. Think about it from the persective of a user,
not a language designer. A language is _just_ a tool, not a beautiful
mathematical abstraction. I hardly care how beautiful a hammer is, if
it does not drive my nail...
In article <1993Feb19.1...@news.cs.indiana.edu>,
"John Lacey" <jo...@cs.indiana.edu> writes:
> An implication of this approach to the issue at hand is that the
> printed representation would be a vector of vectors, but the
> compiler would see multidimensional array primitives, with all
> the optimization benefits of that.
This approach (which I think is a good one) says nothing about the
printed representation. The library could provide functions to
manipulate multidimensional arrays without guaranteeing anything at
all about their printed representation or their representation.
Right.
The most peculiar thing to me about this discussion of arrays for Scheme is
the seemingly widespread notion that arrays should be somehow identified
with vectors that contain vectors. (One common corollary of this belief
seems to be that if you don't have native arrays, you -have- to use vectors
of vectors -- which is inefficient.)
Well, let me point out that the difference between a vector and an array is
really just some arithmetic performed on the indices. If I'm writing a
program that does a lot of manipulating of 4x4 matrices, I can do as well
as any native array implementation by using 16 element vectors, and writing
(vector-ref M (+ (* 4 i) j))
instead of
(array-ref M i j)
If I write my loops so that, in fact, I process the elements of the matrix
in the order that they are layed out in the vector, then a good compiler
should be able to do just as well is it could if I used native arrays.
In order to drive this point home, here is an implementation of arrays for
scheme that works by abstracting out the index arithmetic. This makes it
trivial to provide shared subarrays. In fact, this approach gets you a lot
more than you could ever get by simply using vectors of vectors.
(This code has -not- been thoroughly debugged -- near the end there are a
bunch of little functions that I wrote by hand that have not been
exhaustively tested and could easily contain typos.)
------- Begin arrays.scm
; Arrays for Scheme
;
; Copyright (C) 1993 Alan Bawden
;
; You may use this code any way you like, as long as you don't charge money
; for it, remove this notice, or hold me liable for its results.
; The user interface consists of the following 5 functions
;
; (ARRAY? <object>) => <boolean>
; (MAKE-ARRAY <initial-value> <bound> <bound> ...) => <array>
; (ARRAY-REF <array> <index> <index> ...) => <value>
; (ARRAY-SET! <array> <new-value> <index> <index> ...)
; (MAKE-SHARED-ARRAY <array> <mapper> <bound> <bound> ...) => <array>
;
; When constructing an array, <bound> is either an inclusive range of
; indices expressed as a two element list, or an upper bound expressed as a
; single integer. So
;
; (make-array 'foo 3 3)
;
; and
;
; (make-array 'foo '(0 2) '(0 2))
;
; are equivalent.
;
; MAKE-SHARED-ARRAY can be used to create shared subarrays of other arrays.
; The <mapper> is a function that translates coordinates in the new array
; into coordinates in the old array. A <mapper> must be linear, and its
; range must stay within the bounds of the old array, but it can be
; otherwise arbitrary. A simple example:
;
; (define fred (make-array #F 8 8))
; (define freds-diagonal
; (make-shared-array fred (lambda (i) (list i i)) 8))
; (array-set! freds-diagonal 'foo 3)
; (array-ref fred 3 3) => FOO
; (define freds-center
; (make-shared-array fred (lambda (i j) (list (+ 3 i) (+ 3 j))) 2 2))
; (array-ref freds-center 0 0) => FOO
;
; End of manual.
; If you comment out the bounds checking code, this is about as efficient
; as you could ask for without help from the compiler.
;
; An exercise left to the reader: implement the rest of APL.
;
; Functions not part of the user interface have names that start with
; "array/" (the poor man's package system).
;
; This assumes your Scheme has the usual record type package.
(define array/rtd
(make-record-type "Array"
'(vector
indexer ; Must be a -linear- function!
shape)))
(define array/vector (record-accessor array/rtd 'vector))
(define array/indexer (record-accessor array/rtd 'indexer))
(define array/shape (record-accessor array/rtd 'shape))
(define array? (record-predicate array/rtd))
(define array/construct
(record-constructor array/rtd '(vector indexer shape)))
(define (array/compute-shape specs)
(map (lambda (spec)
(cond ((and (integer? spec)
(< 0 spec))
(list 0 (- spec 1)))
((and (pair? spec)
(pair? (cdr spec))
(null? (cddr spec))
(integer? (car spec))
(integer? (cadr spec))
(<= (car spec) (cadr spec)))
spec)
(else (error "Bad array dimension: ~S" spec))))
specs))
(define (make-array initial-value . specs)
(let ((shape (array/compute-shape specs)))
(let loop ((size 1)
(indexer (lambda () 0))
(l (reverse shape)))
(if (null? l)
(array/construct (make-vector size initial-value)
(array/optimize-linear-function indexer
(length shape))
shape)
(loop (* size (+ 1 (- (cadar l) (caar l))))
(lambda (first-index . rest-of-indices)
(+ (* size (- first-index (caar l)))
(apply indexer rest-of-indices)))
(cdr l))))))
(define (make-shared-array array mapping . specs)
(let ((new-shape (array/compute-shape specs))
(old-indexer (array/indexer array)))
(let check ((indices '())
(bounds (reverse new-shape)))
(cond ((null? bounds)
(array/check-bounds array (apply mapping indices)))
(else
(check (cons (caar bounds) indices) (cdr bounds))
(check (cons (cadar bounds) indices) (cdr bounds)))))
(array/construct (array/vector array)
(array/optimize-linear-function
(lambda indices
(apply old-indexer (apply mapping indices)))
(length new-shape))
new-shape)))
(define (array/in-bounds? array indices)
(let loop ((indices indices)
(shape (array/shape array)))
(if (null? indices)
(null? shape)
(let ((index (car indices)))
(and (not (null? shape))
(integer? index)
(<= (caar shape) index (cadar shape))
(loop (cdr indices) (cdr shape)))))))
(define (array/check-bounds array indices)
(or (array/in-bounds? array indices)
(error "Bad indices for ~S: ~S" array indices)))
(define (array-ref array . indices)
(array/check-bounds array indices)
(vector-ref (array/vector array)
(apply (array/indexer array) indices)))
(define (array-set! array new-value . indices)
(array/check-bounds array indices)
(vector-set! (array/vector array)
(apply (array/indexer array) indices)
new-value))
; STOP! Do not read beyond this point on your first reading of
; this code -- you should simply assume that the rest of this file
; contains only the following single definition:
;
; (define (array/optimize-linear-function f d) f)
;
; Of course everything would be pretty inefficient if this were really the
; case, but it isn't. The following code takes advantage of the fact that
; you can learn everything there is to know from a linear function by
; simply probing around in its domain and observing its values -- then a
; more efficient equivalent can be constructed.
(define (array/optimize-linear-function f d)
(cond ((= d 0)
(array/0d-c (f)))
((= d 1)
(let ((c (f 0)))
(array/1d-c0 c (- (f 1) c))))
((= d 2)
(let ((c (f 0 0)))
(array/2d-c01 c (- (f 1 0) c) (- (f 0 1) c))))
((= d 3)
(let ((c (f 0 0 0)))
(array/3d-c012 c (- (f 1 0 0) c) (- (f 0 1 0) c) (- (f 0 0 1) c))))
(else
(let* ((v (make-list d 0))
(c (apply f v)))
(let loop ((p v)
(old-val c)
(coefs '()))
(cond ((null? p)
(array/Nd-c* c (reverse coefs)))
(else
(set-car! p 1)
(let ((new-val (apply f v)))
(loop (cdr p)
new-val
(cons (- new-val old-val) coefs))))))))))
; 0D cases:
(define (array/0d-c c)
(lambda () c))
; 1D cases:
(define (array/1d-c c)
(lambda (i0) (+ c i0)))
(define (array/1d-0 n0)
(cond ((= 1 n0) +)
(else (lambda (i0) (* n0 i0)))))
(define (array/1d-c0 c n0)
(cond ((= 0 c) (array/1d-0 n0))
((= 1 n0) (array/1d-c c))
(else (lambda (i0) (+ c (* n0 i0))))))
; 2D cases:
(define (array/2d-0 n0)
(lambda (i0 i1) (+ (* n0 i0) i1)))
(define (array/2d-1 n1)
(lambda (i0 i1) (+ i0 (* n1 i1))))
(define (array/2d-c0 c n0)
(lambda (i0 i1) (+ c (* n0 i0) i1)))
(define (array/2d-c1 c n1)
(lambda (i0 i1) (+ c i0 (* n1 i1))))
(define (array/2d-01 n0 n1)
(cond ((= 1 n0) (array/2d-1 n1))
((= 1 n1) (array/2d-0 n0))
(else (lambda (i0 i1) (+ (* n0 i0) (* n1 i1))))))
(define (array/2d-c01 c n0 n1)
(cond ((= 0 c) (array/2d-01 n0 n1))
((= 1 n0) (array/2d-c1 c n1))
((= 1 n1) (array/2d-c0 c n0))
(else (lambda (i0 i1) (+ c (* n0 i0) (* n1 i1))))))
; 3D cases:
(define (array/3d-01 n0 n1)
(lambda (i0 i1 i2) (+ (* n0 i0) (* n1 i1) i2)))
(define (array/3d-02 n0 n2)
(lambda (i0 i1 i2) (+ (* n0 i0) i1 (* n2 i2))))
(define (array/3d-12 n1 n2)
(lambda (i0 i1 i2) (+ i0 (* n1 i1) (* n2 i2))))
(define (array/3d-012 n0 n1 n2)
(lambda (i0 i1 i2) (+ (* n0 i0) (* n1 i1) (* n2 i2))))
(define (array/3d-c12 c n1 n2)
(lambda (i0 i1 i2) (+ c i0 (* n1 i1) (* n2 i2))))
(define (array/3d-c02 c n0 n2)
(lambda (i0 i1 i2) (+ c (* n0 i0) i1 (* n2 i2))))
(define (array/3d-c01 c n0 n1)
(lambda (i0 i1 i2) (+ c (* n0 i0) (* n1 i1) i2)))
(define (array/3d-012 n0 n1 n2)
(cond ((= 1 n0) (array/3d-12 n1 n2))
((= 1 n1) (array/3d-02 n0 n2))
((= 1 n2) (array/3d-01 n0 n1))
(else (lambda (i0 i1 i2) (+ (* n0 i0) (* n1 i1) (* n2 i2))))))
(define (array/3d-c012 c n0 n1 n2)
(cond ((= 0 c) (array/3d-012 n0 n1 n2))
((= 1 n0) (array/3d-c12 c n1 n2))
((= 1 n1) (array/3d-c02 c n0 n2))
((= 1 n2) (array/3d-c01 c n0 n1))
(else (lambda (i0 i1 i2) (+ c (* n0 i0) (* n1 i1) (* n2 i2))))))
; ND cases:
(define (array/Nd-* coefs)
(lambda indices (apply + (map * coefs indices))))
(define (array/Nd-c* c coefs)
(cond ((= 0 c) (array/Nd-* coefs))
(else (lambda indices (apply + c (map * coefs indices))))))
------- End arrays.scm
--
--
Alan Bawden Al...@LCS.MIT.EDU
Work: MIT Room NE43-510, 545 Tech. Sq., Cambridge, MA 02139 (617) 253-7328
Home: 29 Reed St., Cambridge, MA 02140 (617) 492-7274
Regards, [Ag]
Perhaps for the same reason that 1-dimensional arrays are not relegated to a
numerical analysis library: so that compilers, knowing about them,
can optimize codes that use them.
Agreed, but is it really worth it unless changes are also made to the
numeric hierarchy and/or declarations to allow the compiler to know
that you have an array of floats ... etc?
[ examples of how a compiler can optimise 2D array usage ]
With two dimensional arrays, the compiler knows trivially, for
example, that A[i-1,j] and A[i,j] are distinct locations (so that a
store to one does not constrain scheduling of a load to the other).
With arrays of arrays, the compiler has the difficult job of having
to PROVE that A[i-1][j] and A[i][j] do not alias.
Whats wrong with storing them in a single vector and hiding the the
address calculation behind syntax or a procedure?
bevan
PS I appologise if the above suggestion has already been beaten to
death, news seems to be coming in very slowly at the moment. The
article to which I'm responding is dated the 18th but I only recieved
it today (25th).