#'sizeof operator?

David Bakhash

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

Hi,

Is there a way to get sizeof information about an object? i.e. how
would I find out how many bits it takes to store something?

thanks,
dave

Kent M Pitman

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

David Bakhash <ca...@mit.edu> writes:

> Is there a way to get sizeof information about an object? i.e. how
> would I find out how many bits it takes to store something?

Lisp is not about this. It is not a language for describing memory
usage--it is a language for describing what you want to do and for
allowing the implementation to manage memory. Most commercial
implementations give you access to other languages if you want the
other behavior, but Lisp is defined in ways that allow the
implementation to optimize or rearrange memory behind your back. Hash
tables are an excellent example that in some implementations they
change shape internally and even use different lookup tactics
depending on what you store. For example, they might observe that
you've stored integers very densely and substitute an array. Or they
might use a simple alist rather than a hashing mechanism if the
cross-over point for using the hash function at all has not been
exceeded. In light of this, what does it mean to ask the size of an
object? Suppose I said it's called sizeof--show me a program you'd
want to write that uses the answer that would come back...

David Bakhash

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

Kent M Pitman <pit...@world.std.com> writes:

> David Bakhash <ca...@mit.edu> writes:
>
> > Is there a way to get sizeof information about an object? i.e. how
> > would I find out how many bits it takes to store something?
>

> exceeded. In light of this, what does it mean to ask the size of an
> object? Suppose I said it's called sizeof--show me a program you'd
> want to write that uses the answer that would come back...

okay, kent. You asked for it...

first off, I'm writing a compression algorithm whose optimizing cost
function is nothing more than actual memory storage. Thus, at the end
of the day, I want it to find an encoding for a hunk of data which
minimizes _litterally_ the amount of memory used. But this is not so
simple. An example...

suppose I were trying to compress some text, and I noticed that it
looked like:

"abcabcabd"

if I used the length of the encodings as the only contributor to cost,
then I'd probably end up with a symbol set like:

'("a" "b" "c" "d" "abc")

in which case, the encoding of the above string would be:

'(4 4 0 1 3)

But, in building the encoding table, if I knew the _overhead_ to
creating a whole new string (i.e. (sizeof "")) and used that in the cost
function, then who knows? the symbol set might end up looking like:

'("a" "b" "c" "d" "ab")

and, the encoding for the above string would be:

'(4 2 4 2 4 3)

So, I'm not actually using it for the coding -- i.e. it's probably not
gonna be in the code; it's just there to give me an idea of the over
overhead of certain low-level data structures.

dave

Kent M Pitman

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

David Bakhash <ca...@mit.edu> writes:

> first off, I'm writing a compression algorithm whose optimizing cost
> function is nothing more than actual memory storage. Thus, at the end
> of the day, I want it to find an encoding for a hunk of data which
> minimizes _litterally_ the amount of memory used. But this is not so
> simple. An example...

And you're assuming, I guess that (a) the system isn't also optimizing
memory storage for you and that (b) there isn't sharing involved. To
understand MY point of view, you have to understand that the language
I have worked to create is not one that you have to give up as compilers
get smarter. It might make sense in a string context to assume that
strings are pretty much laid out as you expect, although I doubt if you
tried hard you could find any place in the spec where it said they were.
If I do (make-array (1+ most-positive-fixnum)) you might assume it will blow
out for lack of space, but what if it just makes a sparse array? Is
the implementation forbidden from this? No. I think it's purely an issue of
commercial competition. Things people want will succeed and things people
don't will fail, but the language is not trying to force an implementation
not to provide services to the user, so the language doesn't say
"if you ask for a big array, it will take correspondingly much space".
Rather, it says "if you ask for an array of a given size, you'll get back
an object which when you use AREF will get you the elements. But AREF
is not obliged to do that as a memory access, and indeed implementations
can differ on what they do storage-wise if you ask for element type
(unsigned-byte 7).

On the topic of commecial implementation, btw, one implementation a
while back chose to implement various potentially destructive
operations as non-destructive (i.e., NREVERSE just used REVERSE, etc.)
since the side-effect is not required. That was conforming. It also
wasn't a popular design choice. I'd be surprised if they didn't end
up changing it because the market demanded it. But if they didn't,
then presumably it was a good commercial choice for their chosen
market base. Indeed, whether an implementation even offers a compiler
(or an interpreter) is not defined. Some offer both. Some only one
or the other. The semantics of the operations are defined, not how
the implementation must work. The market sorts out what the good
and bad choices are.

Anyway, what you say makes some sort of vague sense of you ask for
size-of (I like it with a hyphen, btw) some string, but then, you can
pretty much use length. And to the extent that you can't, you have to
ask what the stuff is that is not accessible to length. What if I have
an implementation that uses a certain size header but compiles new code
periodically and then updates the header size to be more compact in a
case that a particular kind of header is being used a lot? You're assuming
that now and for all time, headers are decided at compile time, but
need they be? And if they're not, what happens if you store the objects
in an object and then the system behind your back changes something?

You probably think I'm talking science fiction, but am I? Let's look at
cdr-coding. Someone made an observation a long time ago that lists
have a lot of NIL's in them, and on the lisp machine a lot of storage is
saved by instead of storing things in symmetric words, storing things in
a word with 2 bits of info saying what the cdr is. Four bit configurations
are possible, and they decode to: "the next word is the cdr", "the next
word points to the cdr", "the cdr is nil" (with no "other word" needed
anywhere), and "you're in error if you think this thing has a cdr".
To the user who doesn't know any of this, these all behave exactly as
a lisp you don't know... except that generally I bet storage comes out
AUTOMATICALLY more compact than in other implementations. Yet I bet if
you ran your portable storage optimizer on it, you'd risk pessimizing
the storage. For example, in a cdr-coded implementation, doing a RPLACD
sometimes conses because sometimes there isn't room enough in the cdr
to store the thingyou need and words have to get shuffled around to make
room. You'd think this was fatal, but the statistical claim made is
that even though some programs do RPLACD and that sometimes causes consing,
it still doesn't matter because most programs don't, and the benefits
outweigh the risk.

Cdr coding is just one of many mechanisms that I think a system can do
to optimize things IF the user doesn't get it in his head that the
precious moments of his life are best spent trying to accidentally
work against the system's ability to do this. You can't have two
non-cooperating parties "optimizing" something because they will
destroy each other's good work. And the idea in Lisp is that the user
is supposed to work on the domain things and leave the memory layout
to the system. That may be a bad claim, but I think it is fundamental
to Lisp's design and a key reason why Lisp programmers stay "above
the fray" and focused on their domain problems when oher attacks at
the same domain problems in other languages get bogged down in memory
leaks, etc.

It's not that you can't model memory optimization. It's not that you
can't contact a vendor and find out how they lay out memory and assure
yourself that you're not working against THAT vendor. But the language
as structured portably is not about telling vendors how to do their job,
and so when you get done, you should NOT assume that other implementations
will lay out stuff similarly.

By the way, memory layout is not the only thing to optimize, and this
is another reason you should consider not trying to second-guess it.
Paging behavior may be another thing to optimize, and you have no idea
how that is done. EVEN IF I told you that an object took up 2 bytes,
you still wouldn't know if th ose two bytes are contiguous, split into
separate spaces to take advantage of some parallel architecture, broken
over a page boundary, etc. Sometimes these matter. But never portably.
Because architectures vary. Also, certain size objects might work well
for certain operations because they are tuned to a particular internal
machine instruction--getting a more compact data layout may get you worse
performance in some cases. Again, Lisp makes no claims about any of this,
so trying to write solutions to these imagined problems in a portable
way is meaningless. Implementation-specific answers may make sense,
but then all you need is an implementation-specific tool that matches
the need of that implementation. Whether said tool should be portably
available, I'm doubtful.

Suppose you and I share a townhouse where we have a wall in common.
If board to make walls is sold in whole-wall sizes, and if I ask you
how much board it takes to make a housing unit, is the answer 4
[walls] becuase that's what it would take to start a new house, 3
because that's what it takes to add a unit, 3.5 becuase that's what it
costs to share expenses, or...? If you're used to free-standing
houses, you might assume the answer is simple, but it isn't always.

I hope this helps you see my point of view. Not trying to say it
makes your problem any less real, and not trying to say there aren't
vendors out there who can help you on a per-vendor basis, but from the
point of view of the Language, I just don't understand how it could
offer you better than what it does without crippling implementations
in their attempt to become better over time. C, by contrast, pretty
much can't become better over time (without making you change your
programs) and so is happy to offer you what you want in a clear way.

Seth Tisue

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

In article <wkogo7n...@mit.edu>, David Bakhash <ca...@mit.edu> wrote:
>Is there a way to get sizeof information about an object? i.e. how
>would I find out how many bits it takes to store something?

There's no portable way of doing this built into the Common Lisp
specification, for very good reasons which Kent Pitman has summarized
in his posts.

Sometimes though, for a particular task, you want this information.
So pretty much any Lisp implementation gives you a way of doing what
you want, it's just not standard from Lisp to Lisp.

Which Lisp are you using?

--
== Seth Tisue <s-t...@nwu.edu> http://www.cs.nwu.edu/~tisue/
"Thanks to movies, I can vividly picture how I might be violently
removed from existence." - Eric Goldstein

David Bakhash

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to

I'm using ACL 5.0, on MS Windows. But all I need is an estimate on
this:

(size-of "")

and any implementation of Lisp on any system will probably give me a
good enough idea. It's just an parameter for an algorithm of mine to
work. it's not the crux, by any means.

dave

Erik Naggum

unread,

Jan 11, 1999, 3:00:00 AM1/11/99

to

* David Bakhash <ca...@mit.edu>

(defun string-array-2 (n)
(cons (make-string n) (make-string n)))

compile it, run it, inspect the return value, look at the addresses of
the string objects. increase n. repeat until satisfied.

#:Erik

Nick Levine

unread,

Jan 11, 1999, 3:00:00 AM1/11/99

to

>
>
> (defun string-array-2 (n)
> (cons (make-string n) (make-string n)))
>
> compile it, run it, inspect the return value, look at the addresses of
> the string objects. increase n. repeat until satisfied.
>

Ugh, looks like hard work. Can I suggest instead you use the cl:time macro?

(compile (defun foo (n) (time (make-string n)) (values)))

Many lisps (including the three I just tried) print storage allocation
information along with timings.

- nick

Erik Naggum

unread,

Jan 11, 1999, 3:00:00 AM1/11/99

to

* Nick Levine <ni...@harlequin.co.uk>

| Ugh, looks like hard work.

sure. SIZE-OF should be hard work. we don't want to encourage people to
do simple stuff like (time (make-string n)) for increasing values of N
when what they are doing is entirely wrong to begin with. optimizing bad
behavior is not a good idea. moreover, I'm sure he's happier trusting
machine addresses than the output from the TIME macro. incidentally,
TIME reports 32 other bytes too many in Allegro CL unless tuned with

(setq excl::time-other-base 32)

(thanks to Duane Rettig <du...@franz.com> for this tidbit.)

#:Erik

vnik...@poboxes.com

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to

(1) Other techniques?

Apart from cdr-coding, could someone give another example of similar
techniques (that would make the hypothetical (SIZE-OF FOO) return
different values for the same FOO at different times)? (Asking just
out of curiosity.) I can also imagine an implementation that always
returns the same (EQ) pointer for the one and only empty string (""),
and/or that could be optimised for short strings so that the
representation (and `size of') strings of length 1 is different
from that of strings of length n.

(2) sizeof makes sense, SIZE-OF doesn't

Anyway, C is so low level with respect to memory and memory
representation of data objects that sizeof makes perfect sense
there---but not with Lisp. (The C standard goes to great
lengths in describing a specific storage model while avoiding
actual machine dependence---seen anything like that in ANSI CL?)

Besides, if we were to have SIZE-OF in Lisp (I am not
saying we should), should its value return just the `payload'
or also count the system bits? E.g., with fixnums in 32 bits
with 3 tag bits, would the `size' of a fixnum be 29 bits or
32 bits? We could have a holy war on this issue... And this
question simply is not applicable to the languages that have
sizeof like C and C++.

(3) ROOM might also help

Also, if you really must have an idea about the (current!) `size'
of an object without resorting to implementation-specific^1 means
(which every implementation provides, or at least documents
how things are represented internally), you could also use
ROOM, but you'd better allocate a sufficient number of
similar objects before the second^2 call to ROOM as it may also
count memory occupied by system objects that get allocated
and freed behind the curtain.
_____________________________
^1 ROOM is standard (but its results are of course implementation-
specific)
^2 I mean, of course, to: call ROOM; do the allocation; call ROOM

Good luck,
Vassil.

Vassil Nikolov <vnik...@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Kent M Pitman

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to

Replying to vassil's message, but not to him per se. He just gave
me a nice place to attach some thoughts...

vnik...@poboxes.com writes:

> (1) Other techniques?
>
> Apart from cdr-coding, could someone give another example of similar
> techniques (that would make the hypothetical (SIZE-OF FOO) return
> different values for the same FOO at different times)?

The future is a long time. Taking surveys of existing implementations
is not always a good measure of what could come, and the if you abuse
the results, you can risk changing the future. That said, I'll
indulge you...

I think it's possible that self-adaptive tables would reimplement
themselves based on a usage count or if they survive a certain level
of gc. But I can't point to a specific implementation that does.

Another situation is the implementation of integers and floats in
a number of implementations. It is possible that a number can have
a size and then disappear. Uh, this is hard to explain because it
can't show itself in any test such as you ask about, and yet it is
a point of philosophy that troubles me. Consider:
(let* ((x 1) (y 1) (z (+ x y)) ) x)
What is (size-of x)? 1? What about (size-of y)? 1? Would it surprise
you if I told you the size of the constants area of this program can
be 1, not 2? (I hope not.) If you were to measure the size, of course,
it would change. But if you try had to see into my head and what it
is that troubles ME about size-of, it's captured in this example.
SIZE-OF is about a representation presupposing a specific implementation.
But I don't ask LISP for a specific implementation--I ask it to solve
my problem. I want it to have every aspect that I have not pinnned
available for optimization, to the point of translating this program to
(let ((z (+ 1 1))) z)
to the point of translating this program further to
(let ((z 2)) z)
to the point of translating this program to
2
Consequently, even asking whether there IS an X is troubling.

it can even recede to zero size if it's something like
(+ x 1)
and can be reduced in an instruction set to
move r, x(1)
in some instruction set where the "addition" can be handled by
clever indexing.

You could say that the issue is only what size-of can get its hands
on, but not every size-of computation is just about numbers you
can get your hands on. It might be using size-of in a meta-program
like a macro to count distinct constants by recursive descent of
a program at static analysis time, and you'd have to know that
this was wrong...

This question reminds me of the age-old philosophical question of
whether people know that 2+2=4. Well, they probably do. If they die
and you cut open their brain, you can probably, with enough
technology, find the place where 2+2=4 is manifestly represented as a
fact. But can you find 237+118=355? That's not at all so clear.
What does it mean to "know" this? Sometimes it means it can be
computed on demand. And so when you ask how much storage it takes
for these facts, you might find that some facts are not stored at all.
When you teach someone a high-level concept that can substitute for
memory, I like to believe there is at least the potential to gc the
many random facts that it supports, since the high-level concept will
more compactly access a more general result. If not, people are
ill architected. And with computers, where we don't rely on chance
but on design for the architecture, I like to believe that we design
in the right to drop irrelevant detail and I like to believe we've
dealt with specific storage issues as irrelevant.

There was endless fussing in the design of CL over my original choice
to refer to symbol-function and symbol-value as "slots" in a symbol.
A certain vocal and influential set of people on the committee
eventually prevailed in getting me to not say this because they felt
it would imply that the storage must be allocated even if not needed
as a contiguous part of the object. Heck, I don't assume that just
because CLOS objects have slots that they are allocated, or
contiguous. That's up to an implementation. If it wants to grow a
slot on demand and no operator can discern this, that's fine by me.
At least from the language design point of view. A user might say it's
not ok performance-wise, and that's a market issue. But I was forced
to change the wording to a more neutral term to avoid confusion of a
kind that I morally object to ever being considered possible. (It's
not that I don't think people get confused in the way these people
expected; it's that I don't elieve in catering to people who have
that confusion. If you think Lisp memory management is about
dictating storage layout, I think you're just confused.)

What it is to be a slot is that the slot accessor gets the value and
the slot setter changes what the accessor gets. What it is to be an
object of a type is that you seamlessly integrate as an appropriate
argument to all operations on your type. Everything else is fiction.
Creating new operators like SIZE-OF does not just test what you did,
it changes reality and requires reimplementation. The memory you
want to maasure is the Shroedinger's Cat of the Lisp world, and
you're killing the poor fuzzy little thing by peeking inside.
(You meaning whoever wants this operator; not necessarily you,
Vassil.)

> I can also imagine an implementation that always
> returns the same (EQ) pointer for the one and only empty string (""),

Actually, of course, (make-array 10 :fill-pointer 0) is an empty
string, but I guess you mean the only empty string of fixed length
and element-size that is not actually adjustable, does not have
a fill pointer, etc. Heh. But I know what you mean. I think this
is a good example in spite of the caveats one has to make in order
to make it work. I just didn't want anyone walking away thinking
there was only one empty string lest they break their programs.

> and/or that could be optimised for short strings so that the
> representation (and `size of') strings of length 1 is different
> from that of strings of length n.

Well, indeed. Or you could gamble that most strings were
all-lowercase or all-uppercase alphabetic, and use a specialized
implementation of them that you had to morph if you ever did a setf of
them. Indeed, tricks like this may well be used in high-quality
commercial implementations trying to manage unicode, since you don't
want to pay the price of a double-byte character if the person is not
using them.

> (2) sizeof makes sense, SIZE-OF doesn't
>
> Anyway, C is so low level with respect to memory and memory
> representation of data objects that sizeof makes perfect sense
> there---but not with Lisp. (The C standard goes to great
> lengths in describing a specific storage model while avoiding
> actual machine dependence---seen anything like that in ANSI CL?)

Yes, exactly. C is the language you use to lay out memory.
It is useful to have some language that does control memory as
an implementation substrate for others, but it's also useful
to rise above the constant, caring more about what's above.

> Besides, if we were to have SIZE-OF in Lisp (I am not
> saying we should), should its value return just the `payload'
> or also count the system bits? E.g., with fixnums in 32 bits
> with 3 tag bits, would the `size' of a fixnum be 29 bits or
> 32 bits? We could have a holy war on this issue... And this
> question simply is not applicable to the languages that have
> sizeof like C and C++.
>
> (3) ROOM might also help
>
> Also, if you really must have an idea about the (current!) `size'
> of an object without resorting to implementation-specific^1 means
> (which every implementation provides, or at least documents
> how things are represented internally), you could also use
> ROOM, but you'd better allocate a sufficient number of
> similar objects before the second^2 call to ROOM as it may also
> count memory occupied by system objects that get allocated
> and freed behind the curtain.

yes, but don't forget to count sharing. Remember that ROOM
also captures the myriad cases of
'(#1=(A B) #1#))
which if written to a file and re-read would take a different size.
ROOM is very coarse-grained.

This is a good place to say "what are you trying to do and why are you
trying to do it this way". Saying "i'm trying to optimize memory use"
begs the question. For whom? What does your spec look like? Are you
building a memory optimization tool that takes no parameters?
Oughtn't it take parameters that allow the user to parameterize it for
each of the myriad possible design decisions that the implementation
might have taken, and if you (again, the generic "you") don't offer
args like :cdr-coded-implementation-p, then aren't you assuring that
your "portable tool" will be meaningless even if you can "port" it?

David Bakhash

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to

thanks, eric.

dave

Tim Bradshaw

unread,

Jan 12, 1999, 3:00:00 AM1/12/99

to

* vnikolov wrote:

> Apart from cdr-coding, could someone give another example of similar
> techniques (that would make the hypothetical (SIZE-OF FOO) return

> different values for the same FOO at different times)? (Asking just

> out of curiosity.) I can also imagine an implementation that always

> returns the same (EQ) pointer for the one and only empty string (""),

> and/or that could be optimised for short strings so that the
> representation (and `size of') strings of length 1 is different
> from that of strings of length n.

Hash tables with rehashing on GC. CLOS objects that have had their
class definitions changed but that the system is doing some cleverness
on that only reclaims some of the slot storage lazily, perhaps on GC.

The HP48 calculator, which runs a language that is lisp-related though
not actually a lisp really, has two representations for things like
small real numbers which use different amounts of storage! I'm not
sure if it ever fully-automatically converts between representations,
but it may do.

--tim

Steven and Chie Haflich

unread,

Jan 15, 1999, 3:00:00 AM1/15/99

to David Bakhash

David Bakhash wrote:

> I'm using ACL 5.0, on MS Windows. But all I need is an estimate on
> this:
>
> (size-of "")
>
> and any implementation of Lisp on any system will probably give me a
> good enough idea. It's just an parameter for an algorithm of mine to
> work. it's not the crux, by any means.

Despite all the advice you may have received on this subject, you
should instead look inside your ACL distribution at the file

.../misc/lisp.h

which will inform a C program or programmer about the memory representation
of objects in ACL.

I believe this file in included in all versions, but I might be mistaken...

Christopher R. Barry

unread,

Jan 16, 1999, 3:00:00 AM1/16/99

to

Steven and Chie Haflich <sm...@franz.com> writes:

[...]
> .../misc/lisp.h

Hmmmm... Allegro CL runs on Crays....

Christopher