Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Representation of numbers

74 views

Skip to first unread message

luserdroog

unread,

Apr 5, 2015, 11:09:41 PM4/5/15

I've got the basics of frame agreement appearing to work,
and begun to flesh-out the handling of cells larger than
scalars. And before going further with adding adverbs
(although I've added some pieces in support of adverbs),
I want to tackle the issue of numeric types.

Copying J and Kona (and presumably APL, too, but I haven't
actually looked at any true APL source code), I have a
single type tag for each abstract array. This can be
INT, DBL, CHR, SYMB, VRB, BOX, etc. indicating what kind of
data is stored in the "value" portion of the array memory.
In order to access the data, the value pointer must be
cast to the appropriate type. For INT and BOX, the size
of the element is (usually) the same, but for things like
CHR and DBL, the elements have radically different widths.

The unfortunate effect is that this varying width must be
dealt with by all array-manipulating functions -- which
pretty much means all *verb* functions. So even a function
like compression of an array with a boolean vector has to
have different cases to handle the different types, even
though it doesn't actually need the values themselves to do
its job.

So what if they could all be california girls ... er, um...
integers. Want if they could all be integers? So, I slept
on it and scribbled in my graph-paper notebook, and with
the recent experience of UTF-8 still swimming in my head,
I came up with this:

A "number"-encoded (32-bit) integer is conceptually divided
into two (16-bit) half-words. If an integer value fits in
a half-word, the encoded number is the half-word value
with the high half-word set to zero. Otherwise the top half
is a table-index into a BOX array and the low half is an
index into the selected table.

So to encode an integer,
If int&BANK_MASK
If int&BANK_MASK == BANK_MASK
enc=int&IMM_MASK
else
enc=new_fixnum(int)
Else
enc=int

Conversely to decode the value,

If the high half-word is zero,
then low half is an immediate value
and should be sign-extended to a full word.

If the high half-word is not zero,
then the high half is considered a "bank-selector"
and the low half is an index into the specified bank.
The value should be accessed from the appropriate table.

Floating-point values are allocated in the flonum
table similarly to the treatment of the fixnum table.

I've only defined the first two banks so far, a bank
of INTs called fixnum, and a bank of DBLs called flonum.
As the names should suggest, the idea was also influenced
by the discussion of 1970s-era Lisp interpreters in the
fine book Anatomy of Lisp.

As the tables fill-up, they will be chained up with
new tables allocated in the top-level bank array (a BOX-
type abstract array).

So the domain of all verbs will be this encoded number
type which will allow individual numeric values to
overflow into a larger type without having to convert
the entire array of which it is a part. An array of
mostly integers can also contain 'null's this way,
just like "real" APL.

It just occurred to me that I'm not sure how the CHR type
fits into the picture here. Maybe they should be banked
(or *interned*) as well. Again citing compression.

Any comments much appreciated.

--
luserdroog

luserdroog

unread,

Apr 5, 2015, 11:58:17 PM4/5/15

On Sunday, April 5, 2015 at 10:09:41 PM UTC-5, luserdroog wrote:
> I've got the basics of frame agreement appearing to work,
> and begun to flesh-out the handling of cells larger than
> scalars. And before going further with adding adverbs
> (although I've added some pieces in support of adverbs),
> I want to tackle the issue of numeric types.
>

<snip>

> Any comments much appreciated.
>
> --
> luserdroog

Implementation details via the related topic in
comp.lang.c 4/5/2015:
Subject: A panacea for numeric types in apl interpreter
https://groups.google.com/d/topic/comp.lang.c/gmiJ1er4YoQ/discussion

0 new messages