Space efficient sorting of arrays

14 views
Skip to first unread message

Ric Sherlock

unread,
Dec 4, 2001, 7:41:23 PM12/4/01
to
Hi all,
I've been trundling along fine up until now, sorting a 2d array by one of
its columns as follows:
myarray{<-}myarray[{gradeup}myarray[;3];]

Now I'm trying to sort large arrays (>100,000 rows) and am running into
WSFULL problems.
Does anyone have any tips on how to make this sorting more efficient?

Cheers,
Ric Sherlock


Morten Kromberg

unread,
Dec 4, 2001, 9:03:56 PM12/4/01
to
Hi Ric,

If you have 100,000 rows and (say) 10 columns of floating-point data (8
bytes for each number), this array will consume 8Mb. APL will need 2 copies
of the data while it reorders the array, so you will need a workspace in
excess of 16Mb to complete the operation. If you use an external file, and
write then read "chunks" of the array you can reduce the space requirement
somewhat, but before we get into that, perhaps you can answer a couple of
questions:

1) How big is your workspace?
2) Which APL system are you using?

Regards,

Morten Kromberg

-----Original Message-----
From: APL Language Discussion [mailto:AP...@unb.ca]On Behalf Of News
Gateway
Sent: 5. december 2001 02:31
To: AP...@LISTSERV.UNB.CA
Subject: Space efficient sorting of arrays


X-From: "Ric Sherlock" <R.G.Sh...@massey.ac.nz>

Mike Kent

unread,
Dec 4, 2001, 10:01:05 PM12/4/01
to

Don't. Buy memory.

It's dead cheap these days ... 256 MB of registered ECC PC 133
can be had for about $50 from crucial.com.

If you've maxed out the machne already ...


It's not too hard to sort space-efficiently:


@ Uses origin 0

g {<-} {gradeup} A
i[g] {<-} g {<-}{iota}{rho} g @ Invert grade vector

@ Now note that A[i;] {<-} A
@ is the same as A {<-} A[g;]

@ and we may do the latter a step at a time:

k {first}{rho} g

:While k > 0
k {<-} k - 1
:While i[k] {/=} k
A[k,i[k];] {<-} A[i[k],k;]
i[k,i[k]] {<-} i[i[k],k]
:EndWhile
:EndWhile

This is going to be rather slow for large arrays ...
10 seconds for a 100000x4 integer matrix (and
within 10% of the same for a 100000-item vector)
on a machine where the {gradeup} takes 0.32 sec
(matrix) or .13 sec (vector), and the direct
reindexing takes another 0.04 (matrix) or 0.01
(vector) seconds.

It's not doing a lot of extra iteration BTW ...
each iteration puts at least one more row/index
into the right place, and you must at least do
the comparison in the outer loop once per row
so that in the end the cost, after the grade,
is linear. Whopping big linear coefficient,
though.

At the cost of more complexity, you could work
on blocks of the data, swapping the out-of-place
items in a block into place until there are 0
out-of-place items in the block, and then working
on the next block. Note that in the limit, where
the block size is equal to the first axis length
of A, this amounts to A[i;] {<-} A and so the only
extra cost is the inversion of the permutation,
which is cheap. This suggests that the gain from
blocking is likely to be substantial and you may
want to set a block size that's just small enough
to reliably avoid WS full.

Ric Sherlock

unread,
Dec 4, 2001, 10:02:46 PM12/4/01
to
Hi Morten,
I have tried workspace sizes upto 128MB and the program usually falls over
once the array gets up around 120,000 rows. The array I'm working with has
43 odd columns, so using your example, that would translate to 34.4MB (or
64.8MB required for sorting).

Part of the problem is that there are a number of other large variables that
are using up a decent amount of the workspace, and I'm sure I could increase
the size available for sorting by storing some of them to disk. Up until
now though I'd been trying to stay away from that to keep things simple &
quick.
I'm using APL+Win version 3.6.
Regards,
Ric

"Morten Kromberg" <mkro...@insight.dk> wrote in message
news:20011204.2...@sol.sun.csd.unb.ca...

Tomas Gustafsson

unread,
Dec 5, 2001, 5:07:49 AM12/5/01
to
1) If you have 43 columns, am i right when assuming you have lots of mixed
data types? If you have only numeric data, did you know that the most
space-consuming data type (8-byte float) sets the required space consumption
for every other cell? I.e. if you have 10 by 10 of integers between 0 and
127, the array size is 120 bytes (Dyalog APL). Then, if you set the value of
any single cell to a decimal value, the size increases to 820 bytes... just
because of that single float. Solution: Split up the matrix into a few
separate ones by column data type. Sort them separately. But note that this
may apply if you have text data in the matrix as well, or other levels of
enclosure.

2) Maybe better solution: Do not sort at all. Just keep the sort order
vector around and interact with the matrix through that. This is a very thin
layer of rearranging. To retrieve the first 5 rows from the 'sorted' matrix,
have something like:

M[order[1 2 3 4 5];]

You may have to update the sort vector frequently, but if you sort by one
single column, that should be fast enough.
/ Tomas

"Ric Sherlock" <R.G.Sh...@massey.ac.nz> skrev i meddelandet
news:3c0d...@clear.net.nz...

Message has been deleted

David Ness

unread,
Dec 5, 2001, 8:53:05 AM12/5/01
to

These days 100,000 rows is a pretty `small' problem. Aren't you in danger of
wasting a lot of time and energy trying to do this in APL when it would be
a very short task in lots of other languages/environments?

Tomas Gustafsson

unread,
Dec 5, 2001, 11:26:46 AM12/5/01
to
(Forgot a small but important "not" in my previous post)

"Tomas Gustafsson" <we...@sci.fi> skrev i meddelandet
news:9ukrbg$c4q$1...@tron.sci.fi...
> 1) If you have 43 columns bla bla bla...
> But note that this
> may ***NOT*** apply if you have text data in the matrix as well, or other
levels of
> enclosure.


Morten Kromberg

unread,
Dec 5, 2001, 4:28:59 PM12/5/01
to
As Tomas Gustafsson has pointed out, the problem is actually not the upgrade
or "sort", which only uses one of your 43 columns, but the subsequent
reordering of the array, which makes a complete copy of the array. You have
a number of choices:

1) Find more workspace by purchasing more memory, or getting rid of some of
the other things in the workspace.

2) If possible, use Tomas' idea of partitioning the array by data type. If
some of your columns are NOT floating point, but could be represented as
integers or even booleans, there may be a substantial reduction in the
memory requirement (and the speed of many operations) if you split it into
separate Boolean, Integer and Floating arrays.

3) Also suggested by Tomas suggests, leave the array as it is and use the
"order" index to reorder bits of it as required by your application.

4) The quick and dirty solution, if none of the above appeal: Reorder your
array bit by bit, for example:

order <- {upgrade} Data[;3]
:For i :In {iota}43
Data[;i] <- Data[order;i]
:EndFor

... this will only require duplication of one column at a time. Just make
sure to reorder ALL of the columns :).

5) Combinations of the above...

Good luck! / Morten

-----Original Message-----
From: APL Language Discussion [mailto:AP...@unb.ca]On Behalf Of News
Gateway
Sent: 5. december 2001 04:35
To: AP...@LISTSERV.UNB.CA
Subject: Re: Space efficient sorting of arrays


X-From: "Ric Sherlock" <R.G.Sh...@massey.ac.nz>

Hi Morten,

Ric Sherlock

unread,
Dec 5, 2001, 10:03:19 PM12/5/01
to
Thank you all for your input,

David is probably right that there are other programs that could do this
relatively easily - however I don't think that I personally would save much
in terms of time and effort by trying to work out how to interface to some
other solution.

The block sorting (scan, search, merge) approach suggested by both Phil &
Mike sounds like it would be the "best" APL-based approach, but I confess
does sound a little more complicated than I wanted to get.

I haven't quite got my head around what Mike's code is actually doing yet
so, I'll have to investigate that further.

I will keep in mind the use the "order" index for other situations, but I
don't think it would work in this instance.

Which leaves me with the "sort one column at a time" or by datatype
approaches - these seem nice & simple but effective. If I can sort out a
way of automatically working out which columns store only integers & which
store floating point numbers, I'll use the datatype approach, otherwise I'll
use the column at a time approach.

Thanks again for everyone's help,

Ric

David Ness

unread,
Dec 5, 2001, 10:54:19 PM12/5/01
to
Ric Sherlock wrote:
>
> Thank you all for your input,
>
> David is probably right that there are other programs that could do this
> relatively easily - however I don't think that I personally would save much
> in terms of time and effort by trying to work out how to interface to some
> other solution.
>

At worst, write out your data and read it back in ascii.

I can understand, but perhaps it is worthwhile just establishing some
simple facts so you know what kind of a tradeoff you are actually making.

Just for test purposes I created a 24 MB file that had 780,000+ lines of junk
data. This file loaded into K in 2.9 sec, and it `sorted' in 9.2 sec. Writing
the result took 2.1sec.

I obviously have no idea how this compares with your real problem, but it
does suggest that a `solution' in K costs a grand total of less than 15 seconds,
The program, of course, is terse (about 30 chars).

Morten Kromberg

unread,
Dec 6, 2001, 1:39:01 PM12/6/01
to
But... the problem here is NOT the sorting, it is reordering the array. So
unless a "piece by piece" approach is taken, formatting the array in order
to write it to an ASCII file is almost definitely going to WS FULL even
earlier than the current algorithm.

The general problem is that, once your array size is something like 20% of
available workspace, any APL algorithms must be built with care, since many
APL expressions will cause 2-4 copies of the data to be made, sometimes
more.

Alas, as soon as you start to split the array up, your life becomes more
complicated. WS FULL remains my least favourite error message.

Note that some APL systems (I know this is true for at least Dyalog APL and
J) allow you to have arrays which are external to the workspace, mapped to a
file. In theory, APL just slows down when working on these arrays, although
I suspect there are some limits to what sort of operations you can perform.

/ Morten

-----Original Message-----
From: APL Language Discussion [mailto:AP...@unb.ca]On Behalf Of News
Gateway
Sent: 6. december 2001 05:30
To: AP...@LISTSERV.UNB.CA
Subject: Re: Space efficient sorting of arrays


X-From: David Ness <DN...@Home.Com>

David Ness

unread,
Dec 6, 2001, 3:03:38 PM12/6/01
to
Morten Kromberg wrote:
>
> But... the problem here is NOT the sorting, it is reordering the array. So
> unless a "piece by piece" approach is taken, formatting the array in order
> to write it to an ASCII file is almost definitely going to WS FULL even
> earlier than the current algorithm.
>
> The general problem is that, once your array size is something like 20% of
> available workspace, any APL algorithms must be built with care, since many
> APL expressions will cause 2-4 copies of the data to be made, sometimes
> more.
>
> Alas, as soon as you start to split the array up, your life becomes more
> complicated. WS FULL remains my least favourite error message.
>
> Note that some APL systems (I know this is true for at least Dyalog APL and
> J) allow you to have arrays which are external to the workspace, mapped to a
> file. In theory, APL just slows down when working on these arrays, although
> I suspect there are some limits to what sort of operations you can perform.
>

I am at a bit of a loss about how to respond, particularly since my use of
English doesn't find much distinction between `sorting' and `reordering' in
this context.

But let me resist the temptation to respond line by line by just summarizing my
view that the kind of problem presented here is likely to be a `15 sec problem',
and if it causes _any_ consternation or thought about computational strategy, you
are probably working in the wrong environment. I well remember the days that
sorting a 300k file pushed the edge of PC technology, but those days are _long_
gone, and now <100MB problems are often `small'. So if a 200K line 40Mb task is
causing you a problem, there is at least a prima facie case that you are causing
yourself trouble by choosing an environment that doesn't fit well.

A measure of how well an environment `fits' a problem is how much tangential work
it forces you to do. In this case, it sounded to me---at least at the superficial
level of knowledge we have of the problem here---like we were seeing a mis-fit.

Morten Kromberg

unread,
Dec 6, 2001, 4:45:15 PM12/6/01
to
David Ness <DN...@Home.Com> wrote:

> A measure of how well an environment `fits' a problem is how much
tangential work
> it forces you to do. In this case, it sounded to me---at least at the
superficial
> level of knowledge we have of the problem here---like we were seeing a
mis-fit.

Perhaps I misunderstood what you were suggesting - I though you were
suggesting that an external sorting tool would help Ric avoid WS FULL. All I
was saying was that, as far as I can see, this suggestion does not solve the
fundamental issue of the array being a bit too big for "wholesale
manipulation"; the code to do the external sort would essentially encounter
the same issues.

In other words, the effort required to export and reimport the data without
encountering WS FULL seems to be at least as great as using some of the
other solutions, which all have much lower operational complexity.

We don't have enough information about what Ric is tryong to do with his
data do judge whether it makes any sense for him to consider a new
environment for his entire application. He did say he was doing fine except
for this problem. Somehow, I suspect that he is trying to do more than sort
16Mb of data.

/ Morten

David Ness

unread,
Dec 6, 2001, 5:51:32 PM12/6/01
to
Morten Kromberg wrote:
>
> Perhaps I misunderstood what you were suggesting - I though you were
> suggesting that an external sorting tool would help Ric avoid WS FULL. All I
> was saying was that, as far as I can see, this suggestion does not solve the
> fundamental issue of the array being a bit too big for "wholesale
> manipulation"; the code to do the external sort would essentially encounter
> the same issues.

I think you understood me correctly. But what you say suggests to me a `poor'
environment. If it costs a whole lot to `roll-out' and `roll-in' I'd pass on
using the environment at all.

> In other words, the effort required to export and reimport the data without
> encountering WS FULL seems to be at least as great as using some of the
> other solutions, which all have much lower operational complexity.

Strikes me as a rather awful limitation of a programming environment.
I go into and out of environments all the time, generally at a trivial cost
(like a second or two to read/write 50mb files), thus allowing me the luxury
of using perl to handle perl tasks, k to handle k tasks etc. perl, k, j are
all pretty good at doing this kind of stuff. I can't afford any of the
fancy apls so I don't know how good they are at this, but I'd sure hope
you could get out and get back in easily.

> We don't have enough information about what Ric is tryong to do with his
> data do judge whether it makes any sense for him to consider a new
> environment for his entire application. He did say he was doing fine except
> for this problem. Somehow, I suspect that he is trying to do more than sort
> 16Mb of data.

I sure hope so. So far the problem sounds _way_ too easy to run into _any_ problem.

Morten Kromberg

unread,
Dec 6, 2001, 7:24:25 PM12/6/01
to
David Ness wrote:

> I think you understood me correctly. But what you say suggests to me a
`poor'
> environment. If it costs a whole lot to `roll-out' and `roll-in' I'd pass
on
> using the environment at all.

It is indeed trivial to roll in and roll out. But - as far as I can see - it
does not solve the original problem which was posed to the newsgroup by Ric
Sherlock.

Maybe I'm a bit slow today. Perhaps your point is actually "If you wanted to
go there, you wouldn't be starting from here"?

:-)

?

Morten Kromberg

David Ness

unread,
Dec 6, 2001, 8:04:48 PM12/6/01
to
Morten Kromberg wrote:
>
> It is indeed trivial to roll in and roll out. But - as far as I can see - it
> does not solve the original problem which was posed to the newsgroup by Ric
> Sherlock.

The original statement from the original Post was:

> Now I'm trying to sort large arrays (>100,000 rows) and am running into
> WSFULL problems.

And it seemed to me that if I was having trouble with that, all other
things being equal, I'd simply roll-out, sort, and roll-in, and I'd expect
it to cost me only a few seconds for the scale mentioned.

> Maybe I'm a bit slow today. Perhaps your point is actually "If you wanted to
> go there, you wouldn't be starting from here"?
>
> :-)
>

Perhaps it's me that's slow. But at least it isn't my computer languages!

Morten Kromberg

unread,
Dec 6, 2001, 11:23:28 PM12/6/01
to
David Ness wrote:

> The original statement from the original Post was:
>
> > Now I'm trying to sort large arrays (>100,000 rows) and am running into
> > WSFULL problems.
>
> And it seemed to me that if I was having trouble with that, all other
> things being equal, I'd simply roll-out, sort, and roll-in, and I'd expect
> it to cost me only a few seconds for the scale mentioned.

True, but on the line before, he had given the actual APL statement which
was failing (I was assuming that, in deciding to respond to this, you could
read and understand APL - sorry if this was an incorrect assumption).

From this statement, it is possible to determine that the root problem is
the manipulation of the entire array in a single operation, rather than the
fact that the operation was sorting as such. So, since rolling in and out
must also operate on the entire array, it is not a solution in itself. The
solution is to use one of the techniques which were suggested by several
people, to reduce the amount of data being operated upon. Using an external
sort would also be a solution, IF the same care were taken. But since the
array can in fact quite easily be sorted in APL (essentially up to the point
where APL can no longer be used to operate on the data at all), why roll it
in and out?

Anyway, the good news is that I will be off-line for the next 36 hours while
I move from GMT-8 to GMT+1.

/ Morten

David Ness

unread,
Dec 7, 2001, 1:37:27 AM12/7/01
to
Morten Kromberg wrote:
>
> True, but on the line before, he had given the actual APL statement which
> was failing (I was assuming that, in deciding to respond to this, you could
> read and understand APL - sorry if this was an incorrect assumption).

Though I never use APL anymore, I did teach it to a few thousand students and
haven;t yet lost the ability to read simple code. However, AFAIK nothing in the
language dictates how an implementation will or should approach any particular
problem, so I figured the implementation might be intelligent enough to avoid
the creation of unnecessary temporaries, particularly to do a simple roll-out.

> From this statement, it is possible to determine that the root problem is
> the manipulation of the entire array in a single operation, rather than the
> fact that the operation was sorting as such. So, since rolling in and out
> must also operate on the entire array,

Surely this isn't _necessarily_ true. Again, it may be a characteristic of a
weak implementation, but hardly a necessity.

[snip[

Ken Travers

unread,
Dec 7, 2001, 4:33:20 AM12/7/01
to

"David Ness" <DN...@Home.Com> wrote in message
news:3C0FCEEA...@Home.Com...

A few thoughts ...

* On the "Buy more memory" approach ... this may not be the last "large"
array that the person has to sort. Dashing out for more memory each time
the data gets bigger could be uncomfortable.

* On looking for another environment ... perhaps the person is creating
an APL system for general use by other parties. I find myself in this
situation. I am not averse to looking for algoritms within APL, but which
one depends on a few factors. A comprehensive solution would possibly use
them all, with some testing as to which is best for particular data.


Tomas Gustafsson

unread,
Dec 7, 2001, 4:58:09 AM12/7/01
to
David, do J and K use Windows memory by some "flexible" concept? In APL, you
have to tell the interpreter in advance how much memory you want to allocate
for the session, for example 100 MB. APL will then reserve 100 MB regardless
of how much it in fact uses internally.

The reason for asking is that i figured you would need twice that amount if
you pass an array between J and K, i.e. *if* both J and K use memory on the
same, pre-allocation basis, that amount would have to exist twice within the
operating system. And i could imagine that the same happens just because of
the large arrays themselves; if you have a 100 MB array in J and pass it on
to K, won't there be a need of totally 200 MB of simultaneous storage? Then
we have the same problem as when using APL alone; there is the original
array and the temporary copy of it.

...unless you pass the array by looping and reduce it's size in the giving
end while increasing it's size in the receiving end? Or use shared data
between different OS threads, which is a *rather* tricky thing.

In addition, i could imagine that the OS itself may need a relatively big
memory allocation, if you pass the data through OLE.

In fact i suspect that APLWin consumes memory *very* stupidly in this
particular case, creating three instances of the array, so in that sense you
are right.
/ Tomas

"David Ness" <DN...@Home.Com> skrev i meddelandet
news:3C101582...@Home.Com...

David Ness

unread,
Dec 7, 2001, 9:37:51 AM12/7/01
to
Tomas Gustafsson wrote:
>
> David, do J and K use Windows memory by some "flexible" concept? In APL, you
> have to tell the interpreter in advance how much memory you want to allocate
> for the session, for example 100 MB. APL will then reserve 100 MB regardless
> of how much it in fact uses internally.

Wish I knew enough to answer, but I am just a user and don't know _any_
implementation details. Thankfully I've never been forced to. I can say,
however, that neither J nor K require you to `pre allocate' any memory.
And I can remember watching (in my Windows `Task Manager') J eat memory and then
release it, and eat it again etc. while chewing on some big problem.

I can also say that when I was learning J and K I created some benchmarks that
involved things like (speaking informally here)
<sumof>(1 + <iota>n)<power>1.00001
just to get an idea of timing. IIRC the APLs I had access to `died' at about
n=100000 while J and K allowed me to go on into the tens of millions, so
something about memory use was vastly different.

> > The reason for asking is that i figured you would need twice that amount if
> you pass an array between J and K, i.e. *if* both J and K use memory on the
> same, pre-allocation basis, that amount would have to exist twice within the
> operating system. And i could imagine that the same happens just because of
> the large arrays themselves; if you have a 100 MB array in J and pass it on
> to K, won't there be a need of totally 200 MB of simultaneous storage? Then
> we have the same problem as when using APL alone; there is the original
> array and the temporary copy of it.

Well, my machines currently have somewhere between 300mb and 0.5Gb of Ram
(my last purchase was at about $100 for 256mb, so even I can afford this) and
with `swapping space' in windows effectively at least doubling that, I don't
generally run into `memory problems' for my scale of task. As a result, I've
never had to ask Arthur or Roger how they do any of their magic. They seem to
be able to make use of this kind of memory. All I can say is that I `don't see
it or worry about it' much.

> ...unless you pass the array by looping and reduce it's size in the giving
> end while increasing it's size in the receiving end? Or use shared data
> between different OS threads, which is a *rather* tricky thing.

Again. I'm a user, not an implementor. For me, every minute I have to spend
thinking about the implementation of my computer languages is a minute taken
away from working on whatever problem I am trying to solve.

> In addition, i could imagine that the OS itself may need a relatively big
> memory allocation, if you pass the data through OLE.
>
> In fact i suspect that APLWin consumes memory *very* stupidly in this
> particular case, creating three instances of the array, so in that sense you
> are right.
> / Tomas

My guess, and it is only a guess, is that your `model' of memory use is perhaps
outmoded wrt some of the newer languages. If my picture of the evolution of J, and
even more of K, is correct, then I think that `Ram Use' is something that they have
thought a great deal about. Indeed one of the major contribuions of Arthur and his
developers was a realization of just how cheap memory was getting, and just how
many otherwise hugely complex problems dissolve if you are able to effectively
use gigabyte sized memories to `keep database problems in Ram'.

In any case, using these languages allows me to be a beneficiary of all of these
efforts without understanding them in detail one bit.

So, all I can say is that I was stunned back 4 or 5 years ago when I ran my first
benchmarks of J, and later of K, against my APLs. What attracted me most to J and K
was not the stunning speed differences, although they were there, but rather it
was that J and K allowed me to tackle, without concern for implementation detail,
problems that were at least ten times larger, and indeed often a hundred times
larger.

Of course, that was just `my problems'. I obviously can make no claim for the
generality of my results.

Josef Sachs

unread,
Dec 7, 2001, 10:08:12 AM12/7/01
to
>>>>> On Fri, 07 Dec 2001 14:37:51 GMT, David Ness said:

> Indeed one of the major contribuions of Arthur and his developers
> was a realization of just how cheap memory was getting,
> and just how many otherwise hugely complex problems dissolve if you
> are able to effectively use gigabyte sized memories to `keep
> database problems in Ram'.

Arthur has developers?!

I would be extremely interested to know what fraction of k and kdb
were not written by Arthur.

David Ness

unread,
Dec 7, 2001, 10:43:00 AM12/7/01
to
Josef Sachs wrote:
>
> Arthur has developers?!
>
> I would be extremely interested to know what fraction of k and kdb
> were not written by Arthur.

Last time I visited Kx was before they moved, and there were several busy
people around. I didn't ask them for job descriptions though, so I really
have no idea. I also am surprised that you'd be `extremely interested' in
knowing, as I can't imagine that it would matter.

Josef Sachs

unread,
Dec 7, 2001, 1:19:58 PM12/7/01
to
>>>>> On Fri, 07 Dec 2001 15:43:00 GMT, David Ness said:

> I also am surprised that you'd be `extremely interested' in knowing,
> as I can't imagine that it would matter.

I am intrigued that you would be surprised by my interests, even my
extreme interests, whether or not you consider them to matter. Does
that also surprise you?

Be that as it may, here are a couple types to whom it might matter
how many developers a product has:

(1) Those who share Arthur's once-avowed attitude of too-many-cooks-
spoil-the-broth in the extreme. For such people, every additional
developer means a sacrifice in product quality. I can think of very
few core parts of A or A+ that Arthur did not write. Anecdotally,
I was present when Arthur suggested to John Mack, then Morgan Stanley's
Fixed Income Division head, that all of FID's applications ought
to be developed by no more than 6 programmers.

(2) Those who would not accept the risk that a lone developer would
be hit by a truck (sorry to be morbid), lose interest, or otherwise
not be available for continued development and support. Availability
of source code may mitigate this to a degree.

David Ness

unread,
Dec 7, 2001, 2:17:48 PM12/7/01
to
Josef Sachs wrote:
>
> (1) Those who share Arthur's once-avowed attitude of too-many-cooks-
> spoil-the-broth in the extreme. For such people, every additional
> developer means a sacrifice in product quality. I can think of very
> few core parts of A or A+ that Arthur did not write. Anecdotally,
> I was present when Arthur suggested to John Mack, then Morgan Stanley's
> Fixed Income Division head, that all of FID's applications ought
> to be developed by no more than 6 programmers.
>
> (2) Those who would not accept the risk that a lone developer would
> be hit by a truck (sorry to be morbid), lose interest, or otherwise
> not be available for continued development and support. Availability
> of source code may mitigate this to a degree.

Both very relevant points. You obviously _know_ much more about the actual
history of K (and A+) than I do, which I guess increases `my surprise' at
`your surprise', but since---as you rightly point out---my surprise isn't
a matter of any practical consequence, that point can rest.

And I like surprises, anyway.

As to the substance of your comments, if you represent Arthur's view correctly
(and it does `sound' right to me) I'd wholeheartedly concur with it. IME,
programmers have the same productivity performance as do policemen: you get
1/n work out of a collection of n workers, i.e. 2 people do half the work of one
(they spend a lot of time talking with one another---you know, nose of one
police car sniffing tail of another down in the town park that is _least_ subject
to crime).

And to the `hit by a truck' argument, I am sympathetic to it, but unmoved. Multiple
`developers' is no guarrantee that loss of a major design force could be survived.
Strikes me there is just as much chance that the code could be passed to someone
good that was an outsider as that someone good might rise out of a group
already working on some project.

Josef Sachs

unread,
Dec 7, 2001, 2:42:18 PM12/7/01
to
>>>>> On Fri, 07 Dec 2001 19:17:48 GMT, David Ness said:

> You obviously _know_ much more about the actual history of K (and A+)

> than I do, [ ... ]

I did not mean to imply that I know much about the history of K,
though I do for A and A+.

> As to the substance of your comments, if you represent Arthur's view
> correctly (and it does `sound' right to me) I'd wholeheartedly
> concur with it. IME, programmers have the same productivity
> performance as do policemen: you get 1/n work out of a collection of
> n workers, i.e. 2 people do half the work of one (they spend a lot
> of time talking with one another---you know, nose of one police car
> sniffing tail of another down in the town park that is _least_
> subject to crime).

Interestingly, that buzzword-du-jour, Extreme Programming,
strongly argues that two developers are much better than one.

> And to the `hit by a truck' argument, I am sympathetic to it, but
> unmoved. Multiple `developers' is no guarrantee that loss of a major
> design force could be survived.

Yes, no guarantee, but multiple developers can hardly help but increase the
likelihood of survival.

> Strikes me there is just as much chance that the code could be passed
> to someone good that was an outsider as that someone good might rise out
> of a group already working on some project.

Right, as long as you've got the source code. I'm guessing that few people
outside of Kx have the source code to k or kdb (though Arthur has mentioned
that he'd like to make the source available at some point).

(I'm slowly progressing toward using A+ at work.)

cbbr...@acm.org

unread,
Dec 7, 2001, 2:56:37 PM12/7/01
to
Josef Sachs <sa...@panix.com> writes:
> (I'm slowly progressing toward using A+ at work.)

A+ looks pretty interesting, what with "persistent arrays" and the
"adap" communications scheme.

Is there some perspicuous description out there of the _major_
differences between it and more "traditional" APLs?
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/
"What did we agree about a leader??"
"We agreed we wouldn't have one."
"Good. Now shut up and do as I say..."

David Ness

unread,
Dec 7, 2001, 3:24:13 PM12/7/01
to
Josef Sachs wrote:
>
> Interestingly, that buzzword-du-jour, Extreme Programming,
> strongly argues that two developers are much better than one.

Really? Where? I've read many of the XP Wikis, and while I have seen the argument
that you need more than one `Voice' in the client, I've missed the argument that
you need more than one developer. Give me a pointer, I'd like to chase it down.
Sometimes I agree with Ward, often I don't.

> > And to the `hit by a truck' argument, I am sympathetic to it, but
> > unmoved. Multiple `developers' is no guarrantee that loss of a major
> > design force could be survived.
>
> Yes, no guarantee, but multiple developers can hardly help but increase the
> likelihood of survival.

I think I'd argue that, at least one more round. Several of my most successful
developments involved rather complete `handoff' from one small group to
a (distinct) small group of others. And these systems survived for a _very_
long time across a very changed technological base (like 20+ years across
Mainframe -> Time Sharing -> PC). Others, with a whole posse of developers,
often didn't survive one change of management and had _very_ short lives.
Multiple developers -> larger costs -> higher probability of Axe in hard times,
so I;m not convinced your argument washes...

In any event, I don't regard it as obvious that `multiple developers can hardly
help but increase...'

> Right, as long as you've got the source code. I'm guessing that few people
> outside of Kx have the source code to k or kdb (though Arthur has mentioned
> that he'd like to make the source available at some point).

Perhaps that is obviously true for Arthur's code. For (just about) anyone else's
I wouldn't be so sure how important the source was, anyway. IME a surprising
amount of the `core' of a system was really quite obvious by just looking at the
human interface.

> (I'm slowly progressing toward using A+ at work.)

Nice to know someone's using it. The last mail I saw on it was a couple of months
ago, so I was wondering.

Josef Sachs

unread,
Dec 7, 2001, 3:36:16 PM12/7/01
to
>>>>> On Fri, 07 Dec 2001 20:24:13 GMT, David Ness said:

> Josef Sachs wrote:
>> Interestingly, that buzzword-du-jour, Extreme Programming,
>> strongly argues that two developers are much better than one.

> Really? Where? I've read many of the XP Wikis, and while I have seen
> the argument that you need more than one `Voice' in the client, I've
> missed the argument that you need more than one developer. Give me a
> pointer, I'd like to chase it down. Sometimes I agree with Ward,
> often I don't.

A quick search finds this,
from <URL: http://www.extremeprogramming.org/rules/pair.html>:

All code to be included in a production release is created by two
people working together at a single computer. Pair programming
increases software quality without impacting time to deliver. It is
counter intuitive, but 2 people working at a single computer will add
as much functionality as two working separately except that it will be
much higher in quality. With increased quality comes big savings later
in the project.
The best way to pair program is to just sit side by side in front of
the monitor. Slide the key board and mouse back and forth. One person
types and thinks tactically about the method being created, while the
other thinks strategically about how that method fits into the class.
It takes time to get used to pair programming so don't worry if it
feels awkward at first.

Josef Sachs

unread,
Dec 7, 2001, 3:53:25 PM12/7/01
to
>>>>> On Fri, 07 Dec 2001 19:56:37 GMT, <cbbr...@acm.org> said:

> Is there some perspicuous description out there of the _major_
> differences between it and more "traditional" APLs?

Perspicuosity is in the eye of the beholder?
See: <URL: http://www.aplusdev.org/APlusRefV2_62.html>

Brian McGuinness

unread,
Dec 10, 2001, 8:58:00 AM12/10/01
to
David Ness <DN...@Home.Com> wrote in message news:<3C10D417...@Home.Com>...

> Tomas Gustafsson wrote:
> >
> > David, do J and K use Windows memory by some "flexible" concept? In APL, you
> > have to tell the interpreter in advance how much memory you want to allocate
> > for the session, for example 100 MB. APL will then reserve 100 MB regardless
> > of how much it in fact uses internally.
>
> Wish I knew enough to answer, but I am just a user and don't know _any_
> implementation details. Thankfully I've never been forced to. I can say,
> however, that neither J nor K require you to `pre allocate' any memory.
> And I can remember watching (in my Windows `Task Manager') J eat memory and then
> release it, and eat it again etc. while chewing on some big problem.
>
> I can also say that when I was learning J and K I created some benchmarks that
> involved things like (speaking informally here)
> <sumof>(1 + <iota>n)<power>1.00001
> just to get an idea of timing. IIRC the APLs I had access to `died' at about
> n=100000 while J and K allowed me to go on into the tens of millions, so
> something about memory use was vastly different.

This is interesting. I would expect <iota>n to be stored as an arithmetic
progression vector, and the expression <sumof>(1 + <iota>n)<power>1.00001 to
be evaluated in a loop without the 1 + <iota>n having to be expanded as an
intermediate step. So there's no reason why this should cause a WS FULL
error unless the workspace was nearly full in the first place. But then
this expression would not be comparable to the sorting problem mentioned in
the original message.

I wonder how the most recent APL2, Sharp APL, and so on compare to J, K,
and A+ in terms of speed, maximum array size, etc. This might be a good
topic for an APL Quote Quad issue. It would also be nice to learn more
about the internal workings of the interpreters. In principle, I see no
reason why APL shouldn't be able to compete well with the other languages
given interpreter writers of equivalent skill.

--- Brian

David Ness

unread,
Dec 10, 2001, 10:35:57 AM12/10/01
to
Brian McGuinness wrote:
>
> This is interesting. I would expect <iota>n to be stored as an arithmetic
> progression vector, and the expression <sumof>(1 + <iota>n)<power>1.00001 to
> be evaluated in a loop without the 1 + <iota>n having to be expanded as an
> intermediate step. So there's no reason why this should cause a WS FULL
> error unless the workspace was nearly full in the first place. But then
> this expression would not be comparable to the sorting problem mentioned in
> the original message.

I don't have `expectations', so I'm always just stuck with `facts'. The
benchmarks that I reported here several years ago were only of the APLs I
had access to which, IIRC, were Manuguistics V11, AplWin-III and APL2 from IBM.
They all `lost' to K on my benchmark by varying amounts, but again IIRC there was
usually a factor of 5 or 10 involved. The WS full error occurred in an otherwise
unoccupied workspace, as in all cases I `started fresh' (again, IIRC---it was,
after all years ago) and J and K were able to handle problems at least 5 to 10 times
the size of any of my APLs. You can probably find the results in Google Groups
under titles like `J/APL Timings' or `J/K/APL Timings'.

> I wonder how the most recent APL2, Sharp APL, and so on compare to J, K,
> and A+ in terms of speed, maximum array size, etc. This might be a good
> topic for an APL Quote Quad issue. It would also be nice to learn more
> about the internal workings of the interpreters. In principle, I see no
> reason why APL shouldn't be able to compete well with the other languages
> given interpreter writers of equivalent skill.

Ancient research indicated that there is often greater than a 10:1 `performance'
ratio among programmers in a typical group. `Interpreter writers' IME are one of the groups where notions of `equivalent skill' don't obtain. There are _huge_ differences in skill levels. And, this is particularly the case if you give the writer the power to re-define the language on the way. And there are some people who have a particular genuis and feel for the task.

And different interpreter writers bring a different world view to their task. Many
APLs, for example, were concieved and implemented in circumstances where memory cost
money. They may well be conservative in its use. Later languages, perhaps K is an
example, assumed (rightly) that memory was going to be cheap, and that individuals
would be able to afford multi-gigabyte rams if they had problems that demanded them.
So language definition and implementation may have proceeded along completely different
lines, thus producing huge differences in performance.

Stefano Lanzavecchia

unread,
Dec 10, 2001, 10:45:42 AM12/10/01
to
> This is interesting. I would expect <iota>n to be stored as an
> arithmetic progression vector, and the expression <sumof>(1 +
> <iota>n)<power>1.00001 to be evaluated in a loop without the 1 +
> <iota>n having to be expanded as an intermediate step.

As far as I know none of the implementation on PCs uses arithmetic
progression vectors.

> I wonder how the most recent APL2, Sharp APL, and so on compare to J,
> K, and A+ in terms of speed, maximum array size, etc. This might be a

> good topic for an APL Quote Quad issue.

Comparisons aren't easy. Especially those involving K. First of all K
has different scoping rules (lexical scope, like Dyadic's dynamic
functions). Also it doesn't have booleans or short integers, but only 32
bit integers. Also it doesn't promote an integer to a float in case of
overflow...
--
WildHeart'2k1 - mailto:s...@apl.it
Homepage: http://come.to/wildheart/
<<<Nijuyon jikan ugomeku machi o, TONIGHT TONIGHT kakenukeru --- The
city pounds with life, 24 hours a day. Tonight, tonight, I'm charging
on
through.>>>

James J. Weinkam

unread,
Dec 10, 2001, 12:55:39 PM12/10/01
to
Stefano Lanzavecchia wrote:
>
> As far as I know none of the implementation on PCs uses arithmetic
> progression vectors.
>
Running APL2 for OS/2 Entry Edition:

{quad}wa
16692560
z{is}{iota}2000000
{quad}wa
16692528
z{is}z+1
{quad}wa
16692528
z[2]{is}0
{quad}wa
8692528

Stefano Lanzavecchia

unread,
Dec 10, 2001, 1:34:48 PM12/10/01
to

As far as I know, at least one implementation of APL on PCs uses
arithmetic progression vectors. :) Jokes apart, thanks for this. I had a
feeling IBM's APL2 could be the one to implement this, but I had no way
of testing...


--
WildHeart'2k1 - mailto:s...@apl.it
Homepage: http://come.to/wildheart/

<<<Aiyaa, NYANNICHUAN ni ochite shimatta ---
Aiyaa, you fell in Nyannichuan>>>

James J. Weinkam

unread,
Dec 10, 2001, 4:14:17 PM12/10/01
to
Stefano Lanzavecchia wrote:
>
> As far as I know, at least one implementation of APL on PCs uses
> arithmetic progression vectors. :) Jokes apart, thanks for this. I had a
> feeling IBM's APL2 could be the one to implement this, but I had no way
> of testing...
> --
I'm pretty sure even the DOS version of APL2 used arithmetic progressions when it could. If I
remember, I'll check it out next time I fire up the old xt.

Roger Hui

unread,
Dec 10, 2001, 4:44:33 PM12/10/01
to
I am interested in the {quad}wa numbers when you
do reshape of an APV. That is, what are the
outputs when the following lines are executed
in a clear ws?

{quad}wa
z{is}2000 1000{rho}{iota}2000000
{quad}wa


----- Original Message -----
From: "Stefano Lanzavecchia" <s...@APL.IT>
To: <AP...@LISTSERV.UNB.CA>
Sent: Monday, December 10, 2001 10:20 AM
Subject: Re: Space efficient sorting of arrays

> > Stefano Lanzavecchia wrote:
> > >
> > > As far as I know none of the implementation on PCs uses arithmetic
> > > progression vectors.
> > >


> > Running APL2 for OS/2 Entry Edition:
> >
> > {quad}wa
> > 16692560
> > z{is}{iota}2000000
> > {quad}wa
> > 16692528
> > z{is}z+1
> > {quad}wa
> > 16692528
> > z[2]{is}0
> > {quad}wa
> > 8692528
>

Ted Edwards

unread,
Dec 10, 2001, 10:57:10 PM12/10/01
to
James J. Weinkam wrote:

> I'm pretty sure even the DOS version of APL2 used arithmetic progressions when it could. If I
> remember, I'll check it out next time I fire up the old xt.

It did. In fact Jim Brown got the idea from APL*STAR.

Ted


James J. Weinkam

unread,
Dec 11, 2001, 2:27:45 AM12/11/01
to
Roger Hui wrote:
>
> I am interested in the {quad}wa numbers when you
> do reshape of an APV. That is, what are the
> outputs when the following lines are executed
> in a clear ws?
>
> {quad}wa
> z{is}2000 1000{rho}{iota}2000000
> {quad}wa
>
{quad}wa
16692560

z{is}2000 1000{rho}{iota}2000000
{quad}wa
8692528


Now we know where they drew the line.

Roger Hui

unread,
Dec 11, 2001, 9:36:04 AM12/11/01
to
Thanks. Continuing with the sniffing around, can you please
time the following expressions. In each case time the expression
using x, then time it using the APV a.

x{is}?5e5{rho}1e6
a{is}1+2{times}{iota}5e5

{grade up}x vs. {grade up}a
x{iota}x vs. a{iota}a
a{iota}x vs. a{iota}a
+/x vs. +/a
{reverse}x vs. {reverse}a


----- Original Message -----
From: "News Gateway" <owner...@sol.sun.csd.unb.ca>
To: <AP...@LISTSERV.UNB.CA>
Sent: Monday, December 10, 2001 23:59 PM
Subject: Re: Space efficient sorting of arrays

> X-From: "James J. Weinkam" <j...@cs.sfu.ca>

Michael K. Rosenberg

unread,
Dec 11, 2001, 9:33:07 PM12/11/01
to
K 2.9 2001-10-17 Copyright (C) 1993-2001 Kx Systems
Renew 2002-01-15
Licensed to: dfa-fourteen

/times in milliseconds
x:500000 _draw 1000000
a:1+2*!500000
\t <x
328
\t <a
140
\t x?/:x
156
\t a?/:a
78
\t a?/:x
93
\t a?/:a
78
\t +/x
15
\t +/a
15
\t |x
15
\t |a
15

"Roger Hui" <rhu...@shaw.ca> wrote in message
news:20011211.1...@sol.sun.csd.unb.ca...

James J. Weinkam

unread,
Dec 12, 2001, 6:53:32 PM12/12/01
to
Roger Hui wrote:
>
> Thanks. Continuing with the sniffing around, can you please
> time the following expressions. In each case time the expression
> using x, then time it using the APV a.
>
> x{is}?5e5{rho}1e6
> a{is}1+2{times}{iota}5e5
>
> {grade up}x vs. {grade up}a
> x{iota}x vs. a{iota}a
> a{iota}x vs. a{iota}a
> +/x vs. +/a
> {reverse}x vs. {reverse}a
>
Running: APL2 for OS/2 Entry Edition Service Level 19 on 266 MHz Celeron w 128 MB ram
Merlin Convenience Pack (MCP) with CP1 and 01/11/30 kernel.

Times are in seconds.


'timer' need 'jutil'
x{is}?5E5{rho}1E6
a{is}1+2{times}{iota}5E5
10 timer '{grade up}a'
0.0002
10 timer '{grade up}x'
1.9223
10 timer '{grade down}a'
0.0002
10 timer '{grade down}x'
1.934
10 timer 'a{iota}a'
0.1983
10 timer 'x{iota}x'
0.8443
10 timer 'a{iota}a'
0.1982
10 timer 'a{iota}x'
0.1951
10 timer '+/x'
0.0508
10 timer '+/a'
0.0881
10 timer '{reverse}x'
0.0844
10 timer '{reverse}a'
0.2537
)off

Reverse is certainly a surprise considering the results for grade up and grade down.
Simalarly, +/ shoulcn't have been that hard. Oh well.

Robert Bernecky

unread,
Dec 16, 2001, 5:59:50 PM12/16/01
to
See interlarding:

Roger Hui wrote:
>
> Thanks. Continuing with the sniffing around, can you please
> time the following expressions. In each case time the expression
> using x, then time it using the APV a.
>
> x{is}?5e5{rho}1e6
> a{is}1+2{times}{iota}5e5
>
> {grade up}x vs. {grade up}a
> x{iota}x vs. a{iota}a
> a{iota}x vs. a{iota}a
> +/x vs. +/a
> {reverse}x vs. {reverse}a

APEX uses Array Morphology [APL93 -- my how time flies] to detect that
"a" is an APV. This gave us linear-time algorithms for all of the above
except reverse. We could handle reverse by, as I suggested to you
back in the dark ages, by separating arrays from descriptors, and
using addressing polynomials to perform most structural, and some
selection
operations. I think this was done in APL2; the idea was pioneered by
Guibas and
Wyatt in a 1978 POPL paper. They showed how to do transpose, reverse,
some takes/drops, some selections, by manipulating array descriptors.
This is swell because we end up with unit-time structural functions
whose
execution time is independent of array shape.

Again, static analysis (compiler) can help because the cost of
deferencing
array elements though an addressing polynomial is high; analysis can
show
where it's not worth doing.

I am of the opinion that static analysis in a semi-interpreted
environment
can produce array code that executes, for all practical purposes, as
well as,
and sometimes better than, hand-coded C or FORTRAN.

Bob

--
Robert Bernecky Snake Island Research Inc.
bern...@acm.org 18 Fifth Street, Ward's Island
+1 416 203 0854 Toronto, Ontario M5J 2B9 Canada
http://www.snakeisland.com

Robert Bernecky

unread,
Dec 16, 2001, 5:59:49 PM12/16/01
to
Ted missed an earlier chapter here, I believe. APL*STAR was written, if
I recall,
circa 1971-1972. Phil Abrams published his PhD Dissertation "An APL
Machine",
in 1970. In it, Phil discussed "J-vectors" (No relationship to the
language,
as far as I know.), which were arithmetic progression vectors on
integers,
expressed as start, stride, count (in origin zero, of course), as:
start+stride {times}{iota} count

APVs/Jvectors were used in the I.P. Sharp/Intersystems Siemens 4004 APL
interpreter. The feeling in the development group at the end of the
project
was that they were interesting and fun, but rarely made a substantial
performance
difference in practice; they were also a rich source of code faults, due
to
interactions with other parts of the interpreter and attempts to special
case
APV code.

In SHARP APL, we introduced (before the Siemens project, if I recall)
ONE
special case relating to APVs: The idiom "compress iota" [replicate iota
in newspeak] of the form: b compress iota n

iota n returns the first n integers; for each element i of that result,
b gives B[i] copies of that element. In the case where b is Boolean,
this
is a selection or masking operation:

1 0 1 1 1 0 /iota 6
0 2 3 4

1 2 55 3 / 4 2 0 5
1 1 1 1 2 2 3 3 3 3 3

We special-cased the idiom "compress iota", to avoid generating the
potentially
large intermediate result iota N. This met a large number of common
needs
at little cost or complexitg. [compress iota is often used with large
array operations to select an index set for some operation. In such
cases,
generating all of iota N is potentially a memory eater. A tupical
example might
be: Replace all the blanks in TXT by DOT. This could be done as:

TXT[(TXT= ' ') compress iota shape TXT) {assign} DOT

Now, few comments on APVs in APL interpreters:

a. I believe they are rarely of significant enough use in REAL
applications
[not Gee Whiz benchmarks of trivial expressions] to warrent their
cost (See next item).

b. Inclusion of ANY special-case code in an interpreter is usually a net
LOSS to
application-wide performance, as measured by the cost of all
applications (not just
your favorite application). This is because [Go see my MS Thesis on
APEX at
www.snaskeisland.com for citations on array size distribution in real
applications]
most operations in APL are, sadly enough, performed on small arrays
in which the
overhead of setup/conformability checking, special-case checking,
etc., are NOT
amortized over the array elements. This means that the performance of
most
applications can be measured quite well as a function of the number
of primitives
executed. That is, like the CDC STAR100 hardware, array operation
startup time
dominates execution time.

Probably the best example of a poor design choice of this nature in
Standard APL is
that of {singleton extension} on scalar functions. This is a good
idea {scalar extension}
taken to a bad extreme, in my opinion. Here are several reasons why,
although we're
merely looking at the impact on performance in this posting,
supposedly:

- Array + scalar: this extends the scalar to be, in effect the same
shape as the
array, then performs element-wise addition. This is a Good
Thing. The shape
of the result is the shape of Array.

- Array + 1-element vector: Here we start to go downhill. What is
the shape of
the result? Well, it's clearly the shape of Array. Oops. What is
Array is a scalar?
Well, then it's the shape of the vector.

- Array + singleton-of-arbitrary-rank: Here comes the nasty bit:
What's the shape of the result? Well, of course, it's the shape of
Array. Sometimes.
But what if we have two singletons? Then the shape is the shape of the
argument of
greater rank.
Worse yet, the RANK of result depends on the SHAPE (not rank!) of
Array!
3-element-rank-2 + rank-3-singleton produces a rank-2 result
1-element-rank-2 + rank-3-singleton produces a rank-3 result

The interpreter has to analyze this state of affairs on every execution,
due to the
dependence on shape, so it has to make checks on the ranks of both
arguments, as well as on
their shapes. This slows down every execution of + [and every other
scalar function],
and [this is the key point] REGARDLESS of whether the application
happens to exercise
that particular feature or not.

I think the J approach is superior here: Extend scalars. Do not extend
singletons.
It's simpler to understand; it facilitates humand and machine
comprehension, and
is more efficient.

Now, back to APVs. They have a similar problem to that of scalar
extension -- the interpreter
ends up being riddles with special checks to see if, "oh oh, look! -- I
can special case
transpose on J-vectors and get frequent flyer CPU points. Oh wow oh
wow!"
These checks slow down normal execution, and make for buggy interpreter
code.

\begin{compiler/jit rant}
Now, if things are really this black, how can we gain the advantages of
APVs without
making APL developers crazy debugging code? I believe I showed how to do
that in
the APEX compiler. These techniques could be adapted for use in a JIT
interpreter/compiler
for a lot of common applications, I suspect, to great advantage. [If
someone wants to
fund this sort of development, please let me know, as I need brain
stimulation and mortgage money
right now, but am only getting the latter.] Here's what I did in APEX:


a. Perform static analysis of the application [or some set of popular
functions] to determine
where APVs can be of use. [ In fact, the compiler didn't need to do
this, as we'll see later.]

b. Use sophisticated array optimization techniques to perform loop
fusion, array contraction, and
other optimizations. This combines disparate array operation loops
into single loops, and
eliminates the array-valued temps [such as iota n] that were the
source of this posting.

The Really Good Thing about this approach is that the compiler writer
can do it piecemeal -- there
is no need to analyze how this primitive works with that APV -- a lot of
it gets automated for you.
For example, the logd2 benchmark in APEX is a signal processing code
that takes scales and
clips a vector of acoustic signal info, then takes a first difference on
that. Finally, it
sums the result to give a single result back for ease of benchmarking,
although that part is
clearly not part of the original application. The code initially cranked
out by
APEX did NO optimization. It merely called functions for each APL
primitive.
The back end analyzed those calls, folded the code together, eliminated
a number of loops,
and produced a roughly 20-fold speedup in what was very good APL vector
code.

Those speedups came from:
- elimination of huge vector temps,
- scalar, rather than vector code
- elimination of loop induction variable maintenance, etc., code
- localization of memory references

Note that I did NOTHING special to obtain these speedups -- the compiler
optimizers
did it all for me.

How does this tie in to APVs? Let's consider another trivial benchmark:
+/ iota N

This sums the first N integers. The generated code produced two function
calls -- one
generated with a loop making an N-element vector result.
The second had a loop to consume that vector and produce a scalar
result.
That's how an APL interpreter without APVs would do it naively. [Side
note: As an example
of a silly optimization, consider +/APV. This is trivial to implement,
as Gauss noted,
but how much would it get used in practice?]
With no optimizations, the compiled APEX code runs about the same as an
APL interpreter.
Now, let's shove it through the optimizers:

a. Loop fusion takes the two loops and merges them into one.
b. Array contraction notes that we really don't need the iota result, so
it never generates
the vector.

Net result:

- The optimizers achieve the same effect as an APV in the absence of
Gauss: the code operates
in 24 bytes of space regardless of the value of N.

- The compiler writer did NOTHING special to obtain this result.

Hence, I claim that simple, general facilities will usually outperform
special-cased code,
simply because life is too short to get all the special cases.
Furthermore, the
compiler will pick up special cases we haven't even considered, just
because it discovers them
in the course of optimizing.


Here's another example of this sort of thing, and then I'll shut up:

The SHARP APL/J rank conjunction allows a verb/function to be applied
uniformly
to sub-arrays of its argument(s). For example, Mat + rank 1 Vector adds
each
row of Mat to Vector, giving a matrix result. Naive interpreter code
would basically call + as many times as needed on explicitly selected
subarrays,
then glue the results together to form the final result. Correct answer,
but slow.
Greg Bezoff, Charles Brenner, and I spent a fair bit of time and effort
special-casing interpreter primitives to obtain the several-hundred-fold
speed improvements available by special-casing rank within primitives.
Tedious,
boring, and error-prone.

In APEX, I wrote about a dozen lines of C code to generate macro calls
that, to
the casual viewer, were loops identical to the naive call-and-glue slow,
naive interpreter
approach. HOWEVER, the optimizers neatly did away with these loops, or
in-lined them,
so that I obtained the effect of special-casing ALL primitives and
user-defined functions
in one fell swoop, with NO changes to the system. This is the sort of
approach we
should be taking -- along the lines of Sven-Bodo Scholz' SAC -- DELETE
things from
the language, delete things from the underlying run-time system, and
leave the job
of performance to optimizers and static analysis.

\end{compiler/j rant}

Of course, APVs have their virtues, but that's for another time...
Array Morphology shows some examples.

Bob

--

Roger Hui

unread,
Dec 16, 2001, 5:53:44 PM12/16/01
to
(2nd attempt at msg. First attempt appears to have disappeared.)

Bob Bernecky writes on Sunday, December 16:

> Roger Hui wrote:
> >
> > Thanks. Continuing with the sniffing around, can you please
> > time the following expressions. In each case time the expression
> > using x, then time it using the APV a.
> >
> > x{is}?5e5{rho}1e6
> > a{is}1+2{times}{iota}5e5
> >
> > {grade up}x vs. {grade up}a
> > x{iota}x vs. a{iota}a
> > a{iota}x vs. a{iota}a
> > +/x vs. +/a
> > {reverse}x vs. {reverse}a
>
> APEX uses Array Morphology [APL93 -- my how time flies] to detect that
> "a" is an APV. This gave us linear-time algorithms for all of the above
> except reverse. We could handle reverse by, as I suggested to you
> back in the dark ages, by separating arrays from descriptors, and
> using addressing polynomials to perform most structural, and some
> selection operations. I think this was done in APL2; the idea was

> pioneered by Guibas and Wyatt in a 1978 POPL paper. ...

Also, I would be surprised if any of these things take linear
linear time in APL2, if the interpreter supports APV and
already knows that the argument is an APV. For example,
{grade up}a is either {iota}n or its reverse depending
on whether the multiplier for a is positive or 0, or negative.
To compute this result (itself an APV) should take constant time.

Roger Hui

unread,
Dec 16, 2001, 7:25:55 PM12/16/01
to
I made no claims about the utility of APV.
All I am saying, is that IF the interpreter has APVs,
then things like grade, sum, reverse should take
constant time, not the linear time that you said.


----- Original Message -----
From: "Robert Bernecky" <bern...@acm.org>
Newsgroups: comp.lang.apl
To: "Roger Hui" <rhu...@shaw.ca>
Sent: Sunday, December 16, 2001 14:38 PM
Subject: Re: Space efficient sorting of arrays


> Roger Hui wrote:
> >
> > Bob Bernecky writes on Sunday, December 16:

> > > APEX uses Array Morphology [APL93 -- my how time flies] to detect that
> > > "a" is an APV. This gave us linear-time algorithms for all of the
above
> > > except reverse. We could handle reverse by, as I suggested to you
> > > back in the dark ages, by separating arrays from descriptors, and
> > > using addressing polynomials to perform most structural, and some
> > > selection operations. I think this was done in APL2; the idea was
> > > pioneered by Guibas and Wyatt in a 1978 POPL paper. ...
> >
> > Also, I would be surprised if any of these things take linear
> > linear time in APL2, if the interpreter supports APV and
> > already knows that the argument is an APV. For example,
> > {grade up}a is either {iota}n or its reverse depending
> > on whether the multiplier for a is positive or 0, or negative.
> > To compute this result (itself an APV) should take constant time.
>

> How often does upgrade APV appear in real code? I can't think of
> a good use for it. The ability to improve the performance of a primitive
> case that rarely gets used in practice is not such a big deal, methinks,
> particularly if that case has to get checked for by every time through.
> Dispatching by type can get around some of this, but the code explosion
> [and the test case explosion that comes with it] is nasty.

Roger Hui

unread,
Dec 16, 2001, 7:24:26 PM12/16/01
to
In J, grade on a permutation vector takes linear time.
In particular, the second grade in grade grade x
takes linear time.


----- Original Message -----
From: "Robert Bernecky" <bern...@acm.org>
Newsgroups: comp.lang.apl
To: "Roger Hui" <rhu...@shaw.ca>
Sent: Sunday, December 16, 2001 14:34 PM
Subject: Re: Space efficient sorting of arrays

> Roger Hui wrote:
> >
> > Bob Bernecky writes on Sunday, December 16:
> >

> > > Roger Hui wrote:
> > > >
> > >
> > > APEX uses Array Morphology [APL93 -- my how time flies] to detect that
> > > "a" is an APV. This gave us linear-time algorithms for all of the
above
> > > except reverse. We could handle reverse by, as I suggested to you
> > > back in the dark ages, by separating arrays from descriptors, and
> > > using addressing polynomials to perform most structural, and some
> > > selection operations. I think this was done in APL2; the idea was
> > > pioneered by Guibas and Wyatt in a 1978 POPL paper. ...
> >

> > But Bob, the reverse of an APV is also an APV.
> > A demonstration in J. The linear representation 5!:5
> > computes an expression that generates the named object,
> > and:
> >
> > lr=: 3 : '5!:5 <''y.'''
> >
> > lr 3+4*i.10
> > 3+4*i.10
> >
> > lr |. 3+4*i.10
> > 39+_4*i.10
>
> Yes, and there are a bazillion other tiny optimizations like this
> that are dead easy to implement and of minimal utility in practical
> applications. We get fixed-time reverse for free with the Guibas & Wyatt
> work on
> ALL arrays, not just APVs. I'm just not convinced that we can get them
> implemented in interpreted code without slowing down everything else
> that doesn't use them. Not a big slowdown, of course, but with enough
> bricks, you
> can build a wall.
>
> Now, having said that, I also believe there remains substantial utility
> in having APVs in a compiler setting, because there still remain useful
> optimizations that are not handled in more general settings. [I'm trying
> to think
> of a good example...]
>
> Also, remember that array morphology gives us the ability to track and
> detect
> permutation vectors [PV](of which APVs can be considered a rough
> subset), which give us
> a lot of power on the algorithm design front. Granted, we get fixed-time
> upgrade
> on an APV, and only linear-time upgrade on a PV, but where do you run
> into upgrades
> of APVs in real applications? Upgrades of PVs appear in idioms a lot,
> such as the
> "upgrade upgrade" to perform ranking.

Ted Edwards

unread,
Dec 17, 2001, 1:25:25 PM12/17/01
to
Robert Bernecky wrote:

> Ted missed an earlier chapter here, I believe.

Who missed what?

> APL*STAR was written,

Spring 1970 to '73, IIRC.

AFAIK, APL*STAR was the first implementation to have nested arrays.
APVs were handled as described earlier. Also, many operations that did
not alter data values (only structures) could be handled by making a new
header and pointing to the data. It also did not replicate function
arguments unless they were altered in the function in ways that we could
not handle (yet). Clearly all the above, especially nested arrays,
needed good handling of pointers.

It is true that many of these ideas had been discussed in the literature
at that time but I believe APL*STAR was the first impplementation of
them.

Ted

stevan apter

unread,
Dec 18, 2001, 9:46:04 AM12/18/01
to
many individuals had a hand in A/A+, but two people so dominate the overall
intellectual landscape that they should be considered the authors of this
language: arthur, the interpreter writer, and joel kaplan, capo di tutti
capi.

other people -- dozens, i think -- contributed to the current form of A+.
that's partly what made the project so much fun. and despite dark
predictions about what would happen if arthur were snatched by aliens
or moved to california (hey, that's redundant), the interpreter passed
several times from one pair of hands to another without ensuing calamity.

as far as i know, every keystroke in the k interpreter is arthur's work.


"Josef Sachs" <sa...@panix.com> wrote in message news:o91yi6d...@panix1.panix.com...

Robert Bernecky

unread,
Dec 22, 2001, 11:22:10 AM12/22/01
to
Roger Hui wrote:
>
> I made no claims about the utility of APV.
> All I am saying, is that IF the interpreter has APVs,
> then things like grade, sum, reverse should take
> constant time, not the linear time that you said.

Perhaps we're talking at cross-purposes here:

a. Addressing polynomials [Guibas & Wyatt] permit a large
number of structure and selection primitives to be executed
in fixed time, as you point out, and as I pointed in the penultimate
paragraph below.

b. A number of non-linear operations (e.g., searching and sorting)
can be done in linear time on APVs AND on permutation vectors.

Bob

--

Robert Bernecky

unread,
Dec 22, 2001, 11:22:11 AM12/22/01
to
Roger Hui wrote:
>
> In J, grade on a permutation vector takes linear time.
> In particular, the second grade in grade grade x
> takes linear time.

That's interesting. Do you care to tell us how you detect that it's a
PV?
I can think of three ways:

a. Static analysis [as APEX does].
b. Every array descriptor contains a bit indicating that it's a PV.
c. Upgrade figures it out in an initial pass.

Bob

Robert Bernecky

unread,
Dec 22, 2001, 11:22:11 AM12/22/01
to

Ted:

I believe you are correct that APL*STAR was the first implementation
of the ideas pioneered by Abrams, as well as the first implementation
of many other good ideas!

Bob

Roger Hui

unread,
Dec 22, 2001, 2:10:39 PM12/22/01
to
Answer: c.


----- Original Message -----
From: "News Gateway" <owner...@sol.sun.csd.unb.ca>
To: <AP...@LISTSERV.UNB.CA>

Sent: Saturday, December 22, 2001 09:24 AM
Subject: Re: Space efficient sorting of arrays


> X-From: Robert Bernecky <bern...@acm.org>

Roger Hui

unread,
Dec 22, 2001, 2:11:04 PM12/22/01
to
Bob Bernecky writes on Saturday, December 22:

> Perhaps we're talking at cross-purposes here:
>
> a. Addressing polynomials [Guibas & Wyatt] permit a large
> number of structure and selection primitives to be executed
> in fixed time, as you point out, and as I pointed in the penultimate
> paragraph below.
>
> b. A number of non-linear operations (e.g., searching and sorting)
> can be done in linear time on APVs AND on permutation vectors.

My purpose was to contradict your statement,

This gave us linear-time algorithms for all of the above
except reverse.

where "above" was a list of things I wanted APL2 timings on:

x{is}?5e5{rho}1e6
a{is}1+2{times}{iota}5e5

{grade up}x vs. {grade up}a
x{iota}x vs. a{iota}a
a{iota}x vs. a{iota}a
+/x vs. +/a
{reverse}x vs. {reverse}a

You say these things take linear time for an APV; my retort is
that these things should take constant time.

Again, I have no comment here on how useful these things are,
how often they appear in actual code, how some other technique can
give speed ups on a larger class of problems, how wonderful
compilers are vs. interpreters, etc. Just that, IF there are APVs,
how fast can these operations on APVs be computed?

Reply all
Reply to author
Forward
0 new messages