Message from discussion hashtable w/o keys stored...
From: Erik Naggum <e...@naggum.no>
Subject: Re: hashtable w/o keys stored...
References: <email@example.com> <firstname.lastname@example.org> <wWNm2.88$oD6.email@example.com> <firstname.lastname@example.org> <email@example.com>
Organization: Naggum Software; +47 8800 8879; http://www.naggum.no
* David Bakhash <ca...@mit.edu>
| suppose, ideally, I want a hash table whose keys are strings and whose
| values are objects. However, I have so many keys, and each key (string)
| is, say 50 characters long. With 50,000 keys, this can start to really
| swallow up memory. But if, instead, the objects in the hash buckets
| somehow contain the strings efficiently, then I can create those strings
| when needed, but not store them permanently.
| If you're wondering how that can be done, just imagine that two numbers
| (pos . length) specifiy a sub-string inside one huge string. So just
| storing the pos and length is enough.
I'd like to see the actual size of the application that needs this kind
of anal-retentive "efficiency" tinkering. I have an application that I
thought would gobble up all available memory in production, so I wrote a
little thing using SLOT-UNBOUND that gave me lazy-loaded messages that
could be "unloaded" and read back from disk on demand, and prepared to
tie this into GC hooks and all. a message is between 100 bytes and 40K
long, and contains a _lot_ of structure, so the 100-byte message conses
about 1K and the 40K message about 44K. there are approximately 350 of
them per day, 5 days a week today. there will be approximately 3000 of
them per day at the end of this month, albeit the increase will be in
very small messages. the machine has only 64M RAM, so I worried a bit.
a test run of 15000 evenly paced messages (so GC could run near normally
and tenure these bastards at a normal pace) caused Allegro CL to swell to
34M RAM. it was no problem at all. the SLOT-UNBOUND stuff is useful for
a lot of other reasons, so it wasn't a waste of time, but UNLOAD-MESSAGE
never gets called in the production version. the part of my brain that
still remembers when I thought C was a programming language had voiced
its usual misguided protests. a C programmer would have worried that 34M
was a lot of wasted memory. after all, the active lifetime of a message
averages out at 15 seconds, but worst case, it's hanging around in memory
for a week, just because some client _may_ want an old copy of it. the
optimization problem is basically this: it would cost more to figure out
when to unload these messages than I would ever hope to save by doing so
and since the memory is not used for anything else, _any_ time spent on
figuring out when to unload would be a waste, and the occasional load
would _also_ be a loss. still, I'll bet a C programmer would cringe at
the "inefficiency" of having more or less dead objects take up many
megabytes of RAM for a whole week, but what would I do with it? (it's
not like I run Windows or anything.)
the hardest part for a programmer used to C and C++ and that crap is to
shed the _invalid_ concerns. psychologists call them "obsessions" and
charge people a lot to get rid of them. some programmers charge their
users a lot to be able to keep them. go figure.