EQ and FIXNUMS

Vladimir V. Zolotych

unread,

Sep 24, 2001, 5:15:59 AM9/24/01

to

I've accustom myself to think of EQ as doing pointer
comparison. Once interned a particular symbol remains
so symbols that print the same should be EQed. What take
place with numbers and chars in implementation where
EQ works for them ? Does it mean that all objects representing
fixnums, for example, have "standard" places in Lisp ?

--
Vladimir Zolotych gsm...@eurocom.od.ua

Erik Naggum

unread,

Sep 24, 2001, 6:51:37 AM9/24/01

to

* "Vladimir V. Zolotych" <gsm...@eurocom.od.ua>

> I've accustom myself to think of EQ as doing pointer comparison. Once
> interned a particular symbol remains so symbols that print the same
> should be EQed. What take place with numbers and chars in implementation
> where EQ works for them ? Does it mean that all objects representing
> fixnums, for example, have "standard" places in Lisp ?

You can think of eq as a one-instruction machine-word comparison. For
this explanation to be adequate, you need to know how Lisp represents
pointers to its objects _and_ fixnums and characters in the same machine
word. Let me take Allegro CL as an example

(inspect #\a)
character #\a [#x0000030e] char-code #x0061

(inspect #x0061)
fixnum 97 [#x00000184]

The binary representation reveals the internals

(write #x30e :base 2)
1100001110

(write #x184 :base 2)
110000100

(write #x61 :base 2)
1100001

We see that a character has three extra bits (110) and fixnums two (00),
and these bits work as type bits. (Actually, fixnums also have three,
but the least significant bit doubles as a type bit, so both 100 and 000
identify a fixnum.) We notice that on 32-bit hardware, no pointers can
have non-zero values in the least two significant bits, so they are free
for other uses, and the hardware support for adding small integer value
sto pointers at deferencing time, coupled with good hardware support for
unaligned access traps, we get type-checking for free, too. Thus we see
that eq can use machine-word comparison instructions directly.

If you wish, you can consider the machine-word representation of fixnums
and characters as fixed, virtual addresses of constant objects.

eql, on the other hand, will have to check for two bignums and compare
them numerically.

///
--
Why did that stupid George W. Bush turn to Christian fundamentalism to
fight Islamic fundamentalism? Why use terms like "crusade", which only
invokes fear of a repetition of that disgraceful period of Christianity
with its _sustained_ terrorist attacks on Islam? He is _such_ an idiot.

Kent M Pitman

unread,

Sep 24, 2001, 9:45:53 AM9/24/01

to

"Vladimir V. Zolotych" <gsm...@eurocom.od.ua> writes:

> I've accustom myself to think of EQ as doing pointer
> comparison. Once interned a particular symbol remains
> so symbols that print the same should be EQed. What take
> place with numbers and chars in implementation where
> EQ works for them ? Does it mean that all objects representing
> fixnums, for example, have "standard" places in Lisp ?

Typically, an object will be represented as a word containing tag part
and a pointer part. For a symbol, the tag part is generally a symbol
tag, and the pointer part an address of a symbol block (the data for
name, value cell, function cell, plist, etc.) For a number and a
character, there's really no data to point to other than a block of
bits. It seems silly to allocate a pointer 0 just to point through to
a value of 0, a pointer 1 to point to a value of 1, etc. So instead,
I think implementations typically do not create backing store for
characters and integers, or at least not for them within a certain
range, preferring instead to just use the pointer as immediate data.
But conceptually they are a pointer compare.

Nevertheless, there can be virtual machines on which Lisp runs where
it is necessary to violate this. In Maclisp, for example, on the
PDP10, the first 16 machine locations were registers. Sometimes one
would use these as locations while at other times one would use them
as registers. Maclisp used interned small fixnums (it actually had
backing store for the first few integers--the exact number wasn't
advertised, but it was at least a certain size and the actual number
was that minimum + how ever many words were left over to round off to
an even block in the lisp image :-). Below that number, integers were
EQ; above that number, integers were not EQ. Mostly. The one
exception was that sometimes a single integer could need to reside in
a register efficient for number-compilation reasons, and then it would
not be EQ to its normal self. This was very confusing to any users
who tried to rely on it for identity because a register location could
take on any value, and the same pointer might be EQ at one pmoment and
not the next. EQL was invented to paper over this, by saying that
conceptually we would compare numbers AS IF their magnitude was their
pointer, even though sometimes we don't do it that way. EQ was left
behind to do fast comparison by those who really know what they are
doing because designers were paranoid that we would lose critical
speed if people always had to say EQL even in mission-critical inner
loops where their vendor said EQ would have worked.

The right way to think of EQL is that it defines a virtual machine in
which all pointers to numbers and characters are pointers to backing
store where for integers the store contains their numeric value and for
characters the store contains their full code (charset+code), whether
or not they are actually represented that way in the machine. If you
like this model, you can literally never use EQ in CL and always be happy.
EQ exists for those who want to use it but should basically always be
ignorable. (There are some REALLY obscure exceptions but well-styled
code never runs into them.)

Historical aside:

Some people found fun ways to exploit this thing about fixnums in
Maclisp. It was a kind of numeric cons if you compiled out the
type-checking; you could manipulate it with rplaca/rplacd but the type
of it was integer. For example, you could rplacd (change the right
half of the word) such a a "large" (uninterned) fixnum in such a way
as to make it small again but not EQ to most small integers. (I
remember Drew McDermott complaining bitterly when we created Yale
Scheme that it didn't have this feature of rplacd'ing integers.)
Maclisp had no character datatype, for example; one just used
integers. But I made a vector of 128 such integers by taking an
uninterned integer, changing its value to be small (from 0 - 127), and
then called that array my "characters". I made #/A, #/B,
etc. (Maclisp used "/" not "\") return those objects (which were
perfectly fine integers and worked for the input/output operations).
And I made a printer hook so that anything EQ to my array elements
printed as #/A, since the standard Lisp #/ macro worked for input but
lost the identity of the object as a character at read-time, making
(tyo #/A) print as (TYO 65.), when I wanted (TYO #/A). Kind of
hackish, but such were the nature of the times. Maclisp was where
people like me learned to aspire to work in a more elegant language.

Kalle Olavi Niemitalo

unread,

Sep 24, 2001, 10:28:52 AM9/24/01

to

Kent M Pitman <pit...@world.std.com> writes:

> EQ exists for those who want to use it but should basically always be
> ignorable.

If I know that EQ and EQL will have the same result (because the
values being compared aren't numbers or characters), which one
should I use?

Pierre R. Mai

unread,

Sep 24, 2001, 10:12:39 AM9/24/01

to

"Vladimir V. Zolotych" <gsm...@eurocom.od.ua> writes:

> I've accustom myself to think of EQ as doing pointer
> comparison. Once interned a particular symbol remains
> so symbols that print the same should be EQed. What take
> place with numbers and chars in implementation where
> EQ works for them ? Does it mean that all objects representing
> fixnums, for example, have "standard" places in Lisp ?

In most current implementations, what EQ does is more akin to a
machine-word comparisson. For most objects, this means a
pointer-comparisson, since they are non-immediate, that is they are
always passed around via their pointers.

Certain small, unmutatable objects, that can be made to fit entirely
(including any necessary tag-bits, see below) into a machine-word, are
passed around as immediate objects, that is instead of a pointer, they
are passed around themselves inside the machine-word. For current 32bit
or larger machines, this normally includes characters (8-16 bits), and
fixnums (usually 32bit-tag bits for that reason), but it could include
other smallish stuff.

This works, because Common Lisp provides you with no operators that
let you mutate (parts of) a character or fixnum in-place (unlike
e.g. conses), and hence the only useful equality predicate on them is
that of value-equality, i.e. two numbers are "identical" IFF they have
the same value. You cannot normally discern two different numbers
which have the value 4. Therefore immediate representation of chars
and numbers is possible.

In order to discern the type of machine-word at hand (pointer, fixnum
or char), you need to make fixnums and chars distinct from pointers in
their bit-level representation. This is nowadays normally achieved
through tag bits, which are usually taken from the lower-end of the
machine-word, since pointers on modern hardware should almost always
be at least word-aligned (4 bytes for 32bit machines), if not more,
which usually leaves the lower 2-3 bits unused anyways.

One possible scheme might be to use 2 tag bits, in the following
manner:

- 00 indicates a pointer to a non-immediate (i.e. normal) lisp object
- 01 indicates an immediate character object, with the other 30 bits
taken up by the (zero-padded) character number
- 10 indicates an immediate fixnum object, with the other 30 bits
taken up by the fixnum itself
- 11 unused, or used to distinguish other immediate objects, e.g. NIL,
etc.

Thus the implementation is able to just use the normal machine-word
comparisson to compare two objects, either for pointer-equality in the
case of non-immediate objects, or for a simple combined type and value
comparisson for immediate objects.

That said, in reality the tag bit scheme used is normally a little
different, in order to allow fixnum arithmetic to use the built-in
hardware instructions without having to shift out and then shift back
in the tag bits, as well as letting fixnums be as large as possible.
This scheme is enabled by the fact that most modern hardware has
addressing modes that allow the specification of a constant offset to
a given address at no additional cost. Hence we'd use something like
the following to get us 31bit fixnums with a 2bit tagging scheme:

- 00 indicates an even (last-bit 0) fixnum, with 31 bits (including
the first tag bit) for the number itself.
- 01 indicates a pointer to a non-immediate object. The objects are
always accessed with a constant offset of -1, thus masking out the
tag bits without additional effort.
- 10 indicates an odd (last-bit 1) fixnum, with 31 bits (including the
first tag bit) for the number itself.
- 11 indicates a character object

This scheme would mean that we get 31bit fixnums, which we can use the
normal arithmetic routines on (if we take care about underflows, and
shifting the result a bit to the right/left on multiplications and
divisions).

In reality, most implementations want to use an additional tag bit, in
order to differentiate between additional immediate objects (and
possibly different kinds of pointers, for optimized conses, etc.,
and/or to support the GC), so that we usually end up with something
like the following scheme, which is currently employed in CMU CL:

- 000 even fixnum
- 001 function pointer
- 010 even other-immediate (header-words, characters, symbol-value
trap value, etc.)
- 011 list pointer
- 100 odd fixnum
- 101 structure pointer
- 110 odd other immediate
- 111 other-pointer to data-blocks (other than conses, structures, and
functions)

Other implementations will of course differ in details, but the 3bit
tagging scheme with 000/100 (which really is a 2bit tag in that case)
for fixnums is fairly common across modern implementations on 32bit
standard hardware...

Regs, Pierre.

--
Pierre R. Mai <pm...@acm.org> http://www.pmsf.de/pmai/
The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We cause accidents. -- Nathaniel Borenstein

Barry Margolin

unread,

Sep 24, 2001, 11:40:10 AM9/24/01

to

In article <izn66a8...@stekt34.oulu.fi>,

Then use EQ, which doesn't have to check the type codes first to determine
if it can just do a pointer comparison. Think of EQL as being something
like:

(defun eql (x y)
(and (same-type-p x y)
(typecase x
(number (= x y))
(character (char= x y))
(t (eq x y)))))

If you know a priori that the typecase would just fall through to the last
case, there's no point in doing it.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Rob Warnock

unread,

Sep 25, 2001, 5:53:08 AM9/25/01

to

Barry Margolin <bar...@genuity.net> wrote:
+---------------

+---------------

Exactly, which is why I'd think you'd want EQL to start with an EQ check
*first*, since it's cheap. And since I don't find SAME-TYPE-P anywhere in
CLHS, I'd probably suggest something like this:

(defun eql (x y)
(or (eq x y)
(and (eq (type-of x) (type-of y))

(typecase x
(number (= x y))
(character (char= x y))

(t nil)))))

Unfortunately, I think this might contain a potential subtle bug for
some implementations (though not, I think, CMUCL or CLISP). TYPE-OF
might return the list form of type specifier for some numbers or
characters [per examples on the CLHS page for TYPE-OF], and if said
list is freshly cons'd, "(eql foo foo)" might return NIL. (Oops!)

One could change the EQ in the type equality test to EQUAL, but
that just pushes the problem to the definition of EQUAL (and slows
things down at the same time).

-Rob

-----
Rob Warnock, 30-3-510 <rp...@sgi.com>
SGI Network Engineering <http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy. Phone: 650-933-1673
Mountain View, CA 94043 PP-ASEL-IA

[Note: aaan...@sgi.com and zedw...@sgi.com aren't for humans ]

Erik Haugan

unread,

Sep 25, 2001, 9:48:19 AM9/25/01

to

* rp...@rigden.engr.sgi.com (Rob Warnock)

> Exactly, which is why I'd think you'd want EQL to start with an EQ check
> *first*, since it's cheap.

I was about to suggest the same thing, but then it struck me that the vast
majority of values compared probably are _not_ equal, so EQL should be
optimized for determining inequality rather than equality. In cases where
the values are in-equal, EQ gives you no clue. I guess implementations use
highly optimized low level code for this.

Anyway, since EQ is so cheap, I'm not sure it matters.

Erik

Erik Naggum

unread,

Sep 25, 2001, 10:20:37 AM9/25/01

to

* Erik Haugan <er...@haugan.no>

> Anyway, since EQ is so cheap, I'm not sure it matters.

eq is so cheap that it makes sense to inline a call to eq (i.e., make the
direct machine-word comparison) and then call out to a function like
eql-not-eq, which knows the values are not eq, to make the more complex
comparison. However, since the only relevant point of difference between
eql and eq in the usual modern implementations is when comparing numbers
with eachother, it might also make sense to limit testing to them and
avoid the cost of a function call for other types. I imagine that an
implementation could optimize this very heavily with clever use of type
codes if it really wanted to.