* David Bakhash <ca...@mit.edu> | suppose, ideally, I want a hash table whose keys are strings and whose | values are objects. However, I have so many keys, and each key (string) | is, say 50 characters long. With 50,000 keys, this can start to really | swallow up memory. But if, instead, the objects in the hash buckets | somehow contain the strings efficiently, then I can create those strings | when needed, but not store them permanently. | | If you're wondering how that can be done, just imagine that two numbers | (pos . length) specifiy a sub-string inside one huge string. So just | storing the pos and length is enough.
I'd like to see the actual size of the application that needs this kind of anal-retentive "efficiency" tinkering. I have an application that I thought would gobble up all available memory in production, so I wrote a little thing using SLOT-UNBOUND that gave me lazy-loaded messages that could be "unloaded" and read back from disk on demand, and prepared to tie this into GC hooks and all. a message is between 100 bytes and 40K long, and contains a _lot_ of structure, so the 100-byte message conses about 1K and the 40K message about 44K. there are approximately 350 of them per day, 5 days a week today. there will be approximately 3000 of them per day at the end of this month, albeit the increase will be in very small messages. the machine has only 64M RAM, so I worried a bit. a test run of 15000 evenly paced messages (so GC could run near normally and tenure these bastards at a normal pace) caused Allegro CL to swell to 34M RAM. it was no problem at all. the SLOT-UNBOUND stuff is useful for a lot of other reasons, so it wasn't a waste of time, but UNLOAD-MESSAGE never gets called in the production version. the part of my brain that still remembers when I thought C was a programming language had voiced its usual misguided protests. a C programmer would have worried that 34M was a lot of wasted memory. after all, the active lifetime of a message averages out at 15 seconds, but worst case, it's hanging around in memory for a week, just because some client _may_ want an old copy of it. the optimization problem is basically this: it would cost more to figure out when to unload these messages than I would ever hope to save by doing so and since the memory is not used for anything else, _any_ time spent on figuring out when to unload would be a waste, and the occasional load would _also_ be a loss. still, I'll bet a C programmer would cringe at the "inefficiency" of having more or less dead objects take up many megabytes of RAM for a whole week, but what would I do with it? (it's not like I run Windows or anything.)
the hardest part for a programmer used to C and C++ and that crap is to shed the _invalid_ concerns. psychologists call them "obsessions" and charge people a lot to get rid of them. some programmers charge their users a lot to be able to keep them. go figure.