> How about the following approach then. I don?t know
> how to do this in tcl, but in C, I would:
>
> Allocate a large bit array, say 1gb or 8 gbit. Init all to 0.
>
> Hash each nick as you get it into that 8gb space, read the bit,
> using modulo 8g where needed.
>
> If the bit was set don't reply, else set the bit to 1 and reply.
>
> Of course more than 1 nick hashing to the same thing and you?ll miss sending
> out a reply.
>
> Only the OP knows if this is acceptable. There won?t
> be any extra replys however.
>
> So, with 8 gbit, if I did my math right, and we want to cover
> 100 million nicks, then the likihood of missing a reply would be
> about 1 in 800.
Depends upon whether the OP needs "perfection" or "close enough" (and
on just how long the OP plans to have this running between restarts of
the process, but then a restart would also restart the sending of
automated "hi" messages again). But yes, this method would constrain
the memory usage to a fixed amount, at the expense of missing saying
"hi" to some percentage of nick's.
An alternative, which simply changes where the space is used, would be
to link in the sqlite driver and setup an indexed sqlite table to store
seen nicks, then the 'growth' would be on disk instead of in RAM. But
as most systems today have much more free disk than free RAM, this
saves the RAM usage aspect.
Another alternative is to store not only the nick, but a timestamp for
"last seen" - then periodically expire nick's with a last seen date
older than some amount (week, month, six months, etc.). The expiration
process would act as a garbage collector to keep the memory usage
reasonably bounded. But again this means that some nick's will get an
automated "hi" all over again.
But we don't know what the OP really wanted. I suspect they did not
want absolute "perfection". If they were to ever explain in an
understandable way exactly what they want to accomplish, I suspect we'd
learn that the "say hi once" requirement was really:
Don't automatically say hi more often than once a week
or something similar. I.e., I suspect the "expire old lastseen nick's"
method I suggest above would be exactly what the OP wants. The
frequent nick's only get one automated hi, the infrequent ones would
get more than one, but for an infrequent nick, this is probably not a
bad thing.