Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion hashtable w/o keys stored...

From: Erik Naggum <e...@naggum.no>
Subject: Re: hashtable w/o keys stored...
Date: 1999/01/14
Message-ID: <3125273816170693@naggum.no>#1/1
X-Deja-AN: 432399590
References: <cxjlnj8zh33.fsf@acs1.bu.edu> <77g8fd$jfd@pravda.cc.gatech.edu> <wWNm2.88$oD6.7537@burlma1-snr1.gtei.net> <77gh61$k3n@pravda.cc.gatech.edu> <wkiueb2krx.fsf@mit.edu>
mail-copies-to: never
Organization: Naggum Software; +47 8800 8879; http://www.naggum.no
Newsgroups: comp.lang.lisp

* David Bakhash <ca...@mit.edu>
| suppose, ideally, I want a hash table whose keys are strings and whose
| values are objects.  However, I have so many keys, and each key (string)
| is, say 50 characters long.  With 50,000 keys, this can start to really
| swallow up memory.  But if, instead, the objects in the hash buckets
| somehow contain the strings efficiently, then I can create those strings
| when needed, but not store them permanently.
| 
| If you're wondering how that can be done, just imagine that two numbers
| (pos . length) specifiy a sub-string inside one huge string.  So just
| storing the pos and length is enough.

  I'd like to see the actual size of the application that needs this kind
  of anal-retentive "efficiency" tinkering.  I have an application that I
  thought would gobble up all available memory in production, so I wrote a
  little thing using SLOT-UNBOUND that gave me lazy-loaded messages that
  could be "unloaded" and read back from disk on demand, and prepared to
  tie this into GC hooks and all.  a message is between 100 bytes and 40K
  long, and contains a _lot_ of structure, so the 100-byte message conses
  about 1K and the 40K message about 44K.  there are approximately 350 of
  them per day, 5 days a week today.  there will be approximately 3000 of
  them per day at the end of this month, albeit the increase will be in
  very small messages.  the machine has only 64M RAM, so I worried a bit.
  a test run of 15000 evenly paced messages (so GC could run near normally
  and tenure these bastards at a normal pace) caused Allegro CL to swell to
  34M RAM.  it was no problem at all.  the SLOT-UNBOUND stuff is useful for
  a lot of other reasons, so it wasn't a waste of time, but UNLOAD-MESSAGE
  never gets called in the production version.  the part of my brain that
  still remembers when I thought C was a programming language had voiced
  its usual misguided protests.  a C programmer would have worried that 34M
  was a lot of wasted memory.  after all, the active lifetime of a message
  averages out at 15 seconds, but worst case, it's hanging around in memory
  for a week, just because some client _may_ want an old copy of it.  the
  optimization problem is basically this: it would cost more to figure out
  when to unload these messages than I would ever hope to save by doing so
  and since the memory is not used for anything else, _any_ time spent on
  figuring out when to unload would be a waste, and the occasional load
  would _also_ be a loss.  still, I'll bet a C programmer would cringe at
  the "inefficiency" of having more or less dead objects take up many
  megabytes of RAM for a whole week, but what would I do with it?  (it's
  not like I run Windows or anything.)

  the hardest part for a programmer used to C and C++ and that crap is to
  shed the _invalid_ concerns.  psychologists call them "obsessions" and
  charge people a lot to get rid of them.  some programmers charge their
  users a lot to be able to keep them.  go figure.

#:Erik