Undefined behaviour in J

Roger Hui

unread,

May 16, 1996, 3:00:00 AM5/16/96

to

> No, that would not be pedantic, it would be untrue. The Dictionary
> should simply say that the order of evaluation is undefined, just
> as the C language specification tells you that the result of (x + ++x) is
> unpredictable (if I may slip in related example, the C language also
> tells me how to parse x+++y, while the Dictionary does not discuss
> how ::. should parse, except by the Delphic 'one or more').
> Similarly, if the number of times a function is
> evaluated is unpredictable, you should say so. In the absence of such
> a statement, I would hold that the new faster u/\. (which applies
> u to suffixes of increasing length) does not conform to the
> Dictionary, which says "u\.y has #y items resulting from applying
> u to suffixes of y, beginning with one of length #y and continuing
> through a suffix of length 1".

A careful reading of the dictionary indicates that evaluation
order within an operator is unspecified. Everything points to
an unspecified order; nothing points to a particular order.
In any case, an explicit, pedantic statement (one statement
covering all operators) of this takes at most one sentence,
surely insufficient grounds for all this message traffic?

The dictionary does tell you what the word formation rules are
(which covers the symbol ::.); it does not tell you (nor should it
tell you) how to implement those rules.

In the specific case of u\.y, you really must read the dictionary
(quoted above) more carefully. The statement is descriptive rather
than prescriptive; it describes the array u\.y but does not specify
how that array might be computed.

> Regardless of side effects, I say that you must make explicit
> what happens in so common a case as
>
> (i. 2) 0 0} (i. 2)
>
> either by saying the results are unpredictable or
> by revealing the evaluation order.

The one statement covering evaluation order within an operator
would cover this case.

>> Regarding performance characteristics: No APL interpreter
>> or C compiler or other programming language that I have ever
>> used provide the sort of information requested by Rich and Miller.

> With C compilers, the source for the libraries is often distributed.

You seem to think that having the source would be the answer to
performance questions. I stand by my original statement, "No APL
interpreter or C compiler ... provide the sort of information ...".

> >On the other hand,
> >J provides tools for monitoring the time and space characteristics
> >of computations, so that it is quite easy for the user to obtain
> >answers to performance questions in specific cases.

> That seems to me like saying that Toronto has helpful police,
> so I will not need a map if I come to the Users' Conference.

A better analogy is as follows: you want answers to lots of
specific questions, How do I get from King & Bay to the
University of Toronto? How do I get downtown from the airport?
How do I get from City Hall to the Island Ferries? Rather than
have me answer all the specific questions, or have a list
of lots of pairs of points a and b and how to get from a to b,
I am providing you with a map, the 6!:2 and 7!:2 facilities,
with which you can answer these questions for yourself.

> If a tacit function includes a reference to an explicitly-defined
> conjunction, how often is the conjunction interpreted?

Once.

> Two points:
>
> 1. I don't do this kind of thing, but I would think that order
> of evaluation is important for people who deal with very large
> dynamic range of data, such that they worry about floating-point
> effects when large and small are added.

Regardless of the application, regardless of how nice it would be,
regardless of how many times or how strongly you wish it, regardless
of how many messages are sent to comp.lang.apl, regardless of
whatever reason, ... it would be unwise to depend on a particular
order of evaluation within an operator.

> 2. You could see if a function has side effects by looking for
> =: and ". , and perform those in the simple-minded order, leaving
> optimizations for functions known to be well-behaved. (Honestly,
> this is what I guessed you would be doing before I started
> checking).
>
> Of course, you still would need to tell us what the simple-minded
> order is!

I disagree; I don't need to do anything.

> Suppose I have a file that I am trying to break into atoms,
> but the file format is such that I can't find atom delimiters
> except by parsing the file in sequential order. (Maybe it's
> a MIDI file or a compressed data stream). Assume for simplicity
> that I know the number of atoms. ...

Regardless of the application, it would be unwise to depend on a
particular order of evaluation within an operator.

I happen to think that this particular problem is a variation
of the transitive closure problem, a version of which was
previously solved in this forum using an n*^.n algorithm
(breaking a long string of text into lines of width <: w).
The key is to ask the question, if position i were a break,
where is the next break? But please don't ask me (again) to
solve the problem for you: a solution involves the details of
the MIDI or whatever format, and I have no wish to dive into
pages and pages of MIDI specifications.

> 1. My program may break if you change order of evaluation in
> the interpreter, or if I get a compiler.

Precisely; this a good argument for not depending on a particular
order of evaluation within an operator.

> 3. Suppose I had tried
> (atomct $ 0) nextatom"0 _ fileinfo
> as my stab at improvement. You may, much later, (or even
> now, who knows?) choose to detect equal arguments and convert
> my line to a single call to nextatom, breaking my program that
> way.

Hmm, I hadn't thought of this particular bit of economization.
Interesting idea.

> 4. Worst of all, if I am a big enough customer I will threaten
> you with penury if you make such changes, and stultify
> development.

Regardless of how big you are, regardless of whatever you threaten
me with, it would be unwise to depend on a particular order of
evaluation within an operator.

Roger Hui

unread,

May 17, 1996, 3:00:00 AM5/17/96

to

Reference: <DrH29...@lafcol.lafayette.edu>

Cliff Reiter writes on Thursday, May 16:

> The question of what comes first
> in a+b is sort of interesting. I think b gets done first in all current
> implementations but it is easy to test and same comment about parallism
> applies; at one point I was using PC/UNix
> systems with different evaluation orders and I routintely caused side
> effects. Very tricky to keep track of things. I know your question was
> toward a slightly different order of operations, but I think that there
> is hope for parallizing 1 2 3 + 3 4 5 as I said above so order wouldn't
> make sense. Of course, this means we need to be careful whenever
> causing side effects.

To emphasize the point about "a slightly different order of operations":
In J, the evaluation order of a f b _is_ specified by the parsing rules
(b then f then a then a f b); it is the evaluation order within an
operator that is (deliberately) unspecified.

Murray Eisenberg

unread,

May 17, 1996, 3:00:00 AM5/17/96

to

It seems to me that the J sentence

(i. 2) 0 0 } i. 2

is what some used to call a "pornographic expression" in APL.

Of course, there's always the possibility of forming such a thing, in
effect, as an unexpected result of other calculations, but it is so
obviously fraught with danger and subject to changes in the underlying
mechanism of the interpreter that it simply ought to be avoided.

On the other hand, I see nothing wrong with having a list of such
dangerous/pornographic sentences. And, just as modern English
dictionaries characterize some expressions as sub-standard, vulgar,
and the like, so the J Dictionary could do the same. (NOt that I
expect the J Dictionary to tell us ALL the "wrong" things you might
try to express.)

--
Murray Eisenberg Internet: mur...@math.umass.edu
Mathematics & Statistics Dept. Voice: 413-545-2859 (W)
University of Massachusetts 413-549-1020 (H)
Amherst, MA 01003 Fax: 413-545-180

Henry Rich

unread,

May 17, 1996, 3:00:00 AM5/17/96

to

In article <1996051609...@smtp1.sympatico.ca>,
Roger Hui <Roge...@Sympatico.CA> wrote:
>...In any case, an explicit, pedantic statement (one statement

>covering all operators) of this takes at most one sentence,
>surely insufficient grounds for all this message traffic?

I suppose so. Blood has been spilt over the interpretation of words
in the Bible; you might be honored that the Dictionary is taken as
seriously.

>> The Dictionary... says "u\.y has #y items resulting from applying

>> u to suffixes of y, beginning with one of length #y and continuing
>> through a suffix of length 1".
>

>In the specific case of u\.y, you really must read the dictionary
>(quoted above) more carefully. The statement is descriptive rather
>than prescriptive; it describes the array u\.y but does not specify
>how that array might be computed.

Trying with good will to read the passage again, I still don't see
how I could know that the statement is descriptive rather than
prescriptive. Both "beginning" and "continuing" can refer to
order of computation just as much as order of array elements.

But of course, given the disclaimer that order of evaluation is
undefined, the statement would be unambiguous.

>A better analogy is as follows: you want answers to lots of
>specific questions, How do I get from King & Bay to the
>University of Toronto? How do I get downtown from the airport?
>How do I get from City Hall to the Island Ferries? Rather than
>have me answer all the specific questions, or have a list
>of lots of pairs of points a and b and how to get from a to b,
>I am providing you with a map, the 6!:2 and 7!:2 facilities,
>with which you can answer these questions for yourself.

The great thing about a map is that it lets me view an
entire area at once, answering many questions simultaneously,
even answering questions I might not have thought to ask.
If I see a river, I know to look for bridges; if I see
an expressway I stop probing amongst the small streets.
To strain our metaphor, if you have given me
a map, it is a big map in a very dark room, and I have
only a penlight - I cannot see the whole region.

Here is an example. I am wondering whether +@+ or +@:+
will be faster. On the one hand, I favor +@:+ because
the operator works on bigger array; but on the other
hand I fear that +@:+, if it creates a full-size
intermediate copy of the array, might have more page
faults if the array is large enough to overrun physical memory.
Of course, I may be completely misapprehending the
looping structure inside the interpreter.
There may, or may not, be a crossover point at which
one version becomes dramatically slower. To find this
point, I have to have some idea of the amount of physical
memory alloted to my application, and I must make some
guess about memory usage inside the interpreter, and
I then must stab around with different-sized operands and
hope I didn't miss the threshold - if it exists. Since the
testing is on the border of the physical-memory limit, the
tests are slow, and they have to be run multiple times
(i. e. left-operand to 6!:2) to ensure that they don't
sample some artifact of garbage-collection. And
this was all to understand the behavior of +@+ ! Naturally
when I get a new interpreter I have to run all these
probes again.

(As I have been typing this I have been running these tests
on my J2.04 system. After 20 minutes I cannot give
a conclusive answer).

>I happen to think that this particular problem is a variation

>of the transitive closure problem...

>The key is to ask the question, if position i were a break,
>where is the next break? But please don't ask me (again) to

>solve the problem for you...

Ah, thanks for the tip. If you mentioned this before I didn't
see it. Not only is this n log n, it vectorizes the
processing (if the code is not data-dependent).

If the code is data-dependent, I guess the answer is to go
through the data to find the breaks, then use ;. to break
it into atoms & process them - assuming that ;. assembles
its results in linear time.

Henry Rich

Mike Kent

unread,

May 18, 1996, 3:00:00 AM5/18/96

to

I'm with Roger on this. The language ref specifies the RESULT
of an expression, not the path taken to get there.

FWIW, single-element-wise catenation does not have to to be
O(n^2) ... you can trade space for time -- double the space
allocation when the array grows and keep separate "allocated"
and "used" sizes-- and reduce the amortized cost to O(n). I
would not especially like to see this in the interpreter,
though; the cases where you need it are [well, for me at least]
relatively rare, and you can get the same performance boost
by writing the array-size management at the app level when you
do need it.

Finally, most C compilers do NOT come with source code for the
compiler, samples of generated code for various C language
constructs, or source for the standard libraries (though
sometimes you can BUY the library source separately). Even with
library source, you'd have to write test harness and timing code
to get detailed info on performance for various common or
especially interesting cases.

// mike

--
mk...@acm.org

Randy MacDonald

unread,

May 19, 1996, 3:00:00 AM5/19/96

to

> >> The Dictionary... says "u\.y has #y items resulting from applying
> >> u to suffixes of y, beginning with one of length #y and continuing
> >> through a suffix of length 1".

> >In the specific case of u\.y, you really must read the dictionary
> >(quoted above) more carefully. The statement is descriptive rather
> >than prescriptive; it describes the array u\.y but does not specify
> >how that array might be computed.

> Trying with good will to read the passage again, I still don't see
> how I could know that the statement is descriptive rather than
> prescriptive. Both "beginning" and "continuing" can refer to
> order of computation just as much as order of array elements.

First thing, the statement refers to u\.y, which is the RESULT of
the computation, not \. by itself, which is the computation. When one
speaks of the result, one speaks of the ARRAY. The beginning of this
array could, without much effort, be assumed to be 0}y. Perhaps you
thought the statement said:

executing u\.y creates #y items...

which, of course, is a different matter. Yes, "beginning" and
"continuing" can refer to order of computation, but only if the
subject is a computation, not if it isn't.

... this seems more like a grammar lesson, which of course it is

--
|\/| Randy A MacDonald |"You just ASK them?"
|\\| ra...@godin.on.ca | Richard P. Feynman
BSc(Math) UNBF '83 | APL: If you can say it, it's done.
Natural Born APL'er | *** GLi Info: in...@godin.on.ca ***
| Also http://www.godin.com/godin/
------------------------------------------------<-NTP>----{ gnat }-

Henry Rich

unread,

May 28, 1996, 3:00:00 AM5/28/96

to

In article <4njjur$5cj$1...@mhadg.production.compuserve.com>,

Mike Kent <70530...@CompuServe.COM> wrote:
>FWIW, single-element-wise catenation does not have to to be
>O(n^2) ... you can trade space for time -- double the space
>allocation when the array grows and keep separate "allocated"
>and "used" sizes-- and reduce the amortized cost to O(n).

This will work. But note: inserting
the new element might require copying the whole array, which
would make the proposed method worse than the simple way.
I have a tip from the top that } into an unboxed array
does not copy the array, but I believe that } into a boxed
array creates a new copy.

Two other methods are: (1) write the objects to a file and then
read the items back en bloc; (2) use the J namespace as
an associative memory, creating variables 'OBJ0000', 'OBJ0001',
etc., which you later glom together.

The performance of these methods depends on the filesystem
or the name-hashing method used by the interpreter, and I
have no idea how to characterize their performance.

Henry Rich

Mike Kent

unread,

May 30, 1996, 3:00:00 AM5/30/96

to

Henry Rich wrote:
>
> In article <4njjur$5cj$1...@mhadg.production.compuserve.com>,
> Mike Kent <70530...@CompuServe.COM> wrote:
> >FWIW, single-element-wise catenation does not have to to be
> >O(n^2) ... you can trade space for time -- double the space
> >allocation when the array grows and keep separate "allocated"
> >and "used" sizes-- and reduce the amortized cost to O(n).
>
> This will work. But note: inserting
> the new element might require copying the whole array, which
> would make the proposed method worse than the simple way.

No, it wouldn't. Suppose, for instance, you are adding the 513th
item to an array which started with one item. Then you have to
allocate and copy/initialize 1024 cells. Suppose you've been adding
one cell at a time. Then you had to

Copy/allocate 2 cells when cell 2 was added
4 3
8 5
16 9
... ...
256 129
512 257
1024 513

So you've had 9 rounds of copy/allocate with a total of 2046 cells
allocated, and you won't need to re-allocate until you add cell 1025.
The number of rounds is logarithmic in the number of cells appended;
the total cells copied/allocated is bounded between 2x and 4x the
number of cells appended (so linear). There is sometimes a high price
(when you have to double and copy), but the AMORTIZED cost is linear.
[And of course you have to range-check, but APL/J doews that anyway.]

If you trigger the re-alloc on item count, this gives average linear-
time allocation for incrementally grown arrays of any rank >0.

The problem with this strategy is that you can only use it when you
have plenty of "excess" memory; if memory starts to get tight you
pretty well have to revert to the "usual" strategy (and maybe even
run around trying to reclaim scavenge cells from over-allocated live
arrays), and this makes it, probably, on balance, a bad strategy.

The "usual" way is linear in the per-round overhead (it happens every
time you append a cell), and it's quadratic (copies 1+2+ ... +n when
you add the n+1st, total of n(n-1)/2 ), so overall, quadratic.

// mike