How to implement a Hash Table in C

ravi

unread,

Aug 11, 2007, 4:13:27 AM8/11/07

to

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Malcolm McLean

unread,

Aug 11, 2007, 4:29:09 AM8/11/07

to

"ravi" <dcerav...@gmail.com> wrote in message
news:1186820007.2...@i13g2000prf.googlegroups.com...

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The hash
tables chapter is free.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

santosh

unread,

Aug 11, 2007, 7:29:31 AM8/11/07

to

ravi wrote:

The hash table is itself a data structure. It works in conjunction with hash
function.

<http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html>
<http://www.cl.cam.ac.uk/~cwc22/hashtable/>
<http://www.monad.me.uk/algorithms/hash.html>
<http://cbfalconer.home.att.net/download/hashlib.zip>
<http://en.wikipedia.org/wiki/Hash_table>

Ben Bacarisse

unread,

Aug 11, 2007, 8:22:20 AM8/11/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "ravi" <dcerav...@gmail.com> wrote in message
> news:1186820007.2...@i13g2000prf.googlegroups.com...
>> Hi
>> can anybody tell me that which ds will be best suited to implement a
>> hash table in C/C++
>>
>> thanx. in advanced
>>
> Which ds?
> You can read all about hash tables in my book, Basic Algorithms. The
> hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

[Aside: shouldn't a book on basic algorithms illustrate (or at least
describe) how a hash table can grow to accommodate more data?]

To the OP:
1) A proper hash table can grow as data are added.

2) The add function should return an error if the data can't be added
-- for example, some do if the key already exists but it certainly
should if the operation would overflow allocated storage.

3) Most designers would make a hash table, in C at least, that did not
duplicate the key and the data. The idea being to leave the actual
data allocation up to the table user. The table itself would just
store pointers.

--
Ben.

CBFalconer

unread,

Aug 11, 2007, 11:48:43 AM8/11/07

to

ravi wrote:
>
> can anybody tell me that which ds will be best suited to implement
> a hash table in C/C++

I am not sure what your question is. However, a complete, and
portable, hash table implementation under GPL is available as
hashlib at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Malcolm McLean

unread,

Aug 11, 2007, 3:07:20 PM8/11/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
news:87fy2q9...@bsb.me.uk...

> "Malcolm McLean" <regn...@btinternet.com> writes:
>
>> "ravi" <dcerav...@gmail.com> wrote in message
>> news:1186820007.2...@i13g2000prf.googlegroups.com...
>>> Hi
>>> can anybody tell me that which ds will be best suited to implement a
>>> hash table in C/C++
>>>
>>> thanx. in advanced
>>>
>> Which ds?
>> You can read all about hash tables in my book, Basic Algorithms. The
>> hash tables chapter is free.
>
> I think it should be pointed out that your hash table is not like
> most. Most hash tables are dynamic structures that can grow as data
> are added. Yours is fixed. The interface should have a clear warning
> that the table fails (silently) if an attempt is made to add an item
> beyond the initial capacity.
>
> The point I made before (that the function loop indefinitely when the
> table becomes full) is still true, but I now see that the add function
> will silently fail long before that point is reached. If this limit
> is by design, I think the text should make that limitation very clear.
>

I think your understanding of hash tables might need to be raised a notch.
If a hash table beomes over full then the algorithm becomes pathological.
Just as if there are only a few empty spaces in a car park, you spend
virtulaly all your time crawling around looking for a slot.
So the table length is twice the capcity, passed into the constructor.
However if the table items are large, that wastes space. So the able itself
id a list of pointers. However calling malloc() for each item would be
inefficient unless the items are extremely large. So we use the fixed
allocator (in the chapter memory games, also free).
This will be exhaused once the table capacity is reached, so the algorithm
will never reach the pathological condition. I admit that this code might be
a bit hairy. It is probably best to add an explicit test against table
capacity to make clear that the table cannot be over-filled.
The initial code is a simlle hash table tyhat uses linear probing, and has a
fixed capacity. The second implenetation introduces quadratic probing and
dynamic tables. These do raise a lot of memory management and efficiency
issues, which are evaded by making th second table store only pointers

>
> It is odd (and probably very confusing to a beginner) that the add
> function seems to return a success/fail indication, but that it does
> not use it to signal this disastrous case.
>

The "disastrous case" can't happen. The fixed allocator will run out of
memory and return null when capity is reached. Please avoid this sort of
hyperbole.

>
> [Aside: shouldn't a book on basic algorithms illustrate (or at least
> describe) how a hash table can grow to accommodate more data?]
>

It does. That's in the second half of the chapter. I introduce the central
idea first, and then discuss some refinements and complications

>
> To the OP:
> 1) A proper hash table can grow as data are added.
>

Done

>
> 2) The add function should return an error if the data can't be added
> -- for example, some do if the key already exists but it certainly
> should if the operation would overflow allocated storage.
>

Done.

>
> 3) Most designers would make a hash table, in C at least, that did not
> duplicate the key and the data. The idea being to leave the actual
> data allocation up to the table user. The table itself would just
> store pointers.
>

The second table does store pointers. The best representation of keys is a
whole separate issue, which I don't go into in any detail.
>
The other basic point you've missed is that the code is designed to
introduce hash tables gradually. So a few details are missing from the first
examples, so that the central concept of hashing can be brought across.
If you want to go from a state of not understanding hash tables to
understanding how to code them, go to my book. If you want a drop-in hash
table to put into production C code, Chuck's table is better. There is no
point in us duplicating our efforts.

Ark Khasin

unread,

Aug 11, 2007, 3:32:09 PM8/11/07

to

Mr McLean appears to be a man of strong faith. As such, he is not
persuaded/dissuaded by rational argument of which elsethread is plenty.

CBFalconer

unread,

Aug 11, 2007, 3:32:40 PM8/11/07

to

Malcolm McLean wrote:
>
... snip ...

>
> I think your understanding of hash tables might need to be raised
> a notch. If a hash table beomes over full then the algorithm
> becomes pathological. Just as if there are only a few empty spaces
> in a car park, you spend virtulaly all your time crawling around
> looking for a slot.

Not necessarily so. For example, the code in hashlib automatically
expands the table, so that the system is normally between roughly
44 and 88% full. This expansion can continue to ridiculous limits.

Keith Thompson

unread,

Aug 11, 2007, 6:09:32 PM8/11/07

to

ravi <dcerav...@gmail.com> writes:
> can anybody tell me that which ds will be best suited to implement a
> hash table in C/C++

Until I read some of the responses, I had no idea what you meant by
"ds" (apparently it's an abbreviation for "data structure").

Please don't use abbreviations like that unless you're sure everyone
is going to understand what they mean. It doesn't take very long to
type out the words "data structure".

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ben Bacarisse

unread,

Aug 11, 2007, 11:02:24 PM8/11/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87fy2q9...@bsb.me.uk...
>> "Malcolm McLean" <regn...@btinternet.com> writes:
>>
>>> "ravi" <dcerav...@gmail.com> wrote in message
>>> news:1186820007.2...@i13g2000prf.googlegroups.com...
>>>> Hi
>>>> can anybody tell me that which ds will be best suited to implement a
>>>> hash table in C/C++
>>>>
>>>> thanx. in advanced
>>>>
>>> Which ds?
>>> You can read all about hash tables in my book, Basic Algorithms. The
>>> hash tables chapter is free.
>>
>> I think it should be pointed out that your hash table is not like
>> most. Most hash tables are dynamic structures that can grow as data
>> are added. Yours is fixed. The interface should have a clear warning
>> that the table fails (silently) if an attempt is made to add an item
>> beyond the initial capacity.
>>
>> The point I made before (that the function loop indefinitely when the
>> table becomes full) is still true, but I now see that the add function
>> will silently fail long before that point is reached. If this limit
>> is by design, I think the text should make that limitation very clear.
>>
> I think your understanding of hash tables might need to be raised a
> notch.

If you are going to be like that...

<snip>

>> It is odd (and probably very confusing to a beginner) that the add
>> function seems to return a success/fail indication, but that it does
>> not use it to signal this disastrous case.
>>
> The "disastrous case" can't happen. The fixed allocator will run out
> of memory and return null when capity is reached. Please avoid this
> sort of hyperbole.

OK. Less hyperbole. Your code exhibits undefined behaviour. You
'dereference' a null pointer. On my system I get a segmentation
violation before hash_add gets any say in the matter. If or when you
find the error, you will say it is a detail, a typo, whatever. I
don't think you worry about details like this.

Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.

--
Ben.

Malcolm McLean

unread,

Aug 12, 2007, 3:02:44 AM8/12/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:87y7ghh...@bsb.me.uk...

Obviously if you say things like "the function will not return at all when
the table is full" you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than about 88% full before the
algorithm becomes pathological. In fact I pass in a "capacity" argument
which is half table size.
I think you need to read the chapter again, as a pupil rather than as a
critic.

As for the derefencing null pointer, that will happen if the constructor
fails, and return NULL. That is standard behaviour throughout the book. If
objects cannot be created, the constructing function returns a null pointer
to show they have failed.
I couldn't see another. The code has been tested, but only on two systems (a
UNIX mainframe and Windows) so that doesn't mean no errors remain, and of
course typographical errors do creep in in the process of reformatting for
print - I wish I had a tool to format source code automatically but I don't.
The normal thing when you an error such as dereference of a null is to say
"your code derererences a null" rather than to talk airly about "catastropic
behaviour" in an arrogant manner.

>
> Do you care, for example, that your statement:
>
> The test for primality is interesting. The only absolutely sure way
> of determining whether a number is prime, as far as is known, is to
> attempt to divide by all prime numbers below or equal to its square
> root.
>
> is not true? Probably not. I will try, in future, to leave you alone.
>

You mean randomised primality testing? It's not absolutely sure.
I am no mathematician, maybe you could tell us your method?

James Dow Allen

unread,

Aug 12, 2007, 4:56:14 AM8/12/07

to

On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> "Ben Bacarisse" <ben.use...@bsb.me.uk> wrote ...

> >> I think your understanding of hash tables might need to be raised a
> >> notch.

Hash tables are frequently discussed in this and "nearby"
ngs, but it is clear that most programmers are unaware of
innovations like Cuckoo Hash or "Compact Hash", etc.
These aren't just theoretical curiosities but have important
advantages over hash tables described in old textbooks.

> you cannot understand that a hash table (to be pedantic,
> one without "perfect hashing") cannot be more than
> about 88% full before the algorithm becomes pathological.

Cuckoo hash, just to give one example, can operate
well above 88%. ("Pathology" is a matter of semantics
since many methods can encounter O(n) behavior even
at low utilization, with non-zero probability,
due to very bad "luck".) Cuckoo hash, BTW, has
another very important space-savings advantage over
ordinary open-address hashing. I'll leave the
identification of this advantage as an exercise.... :-)

And popular "chained hashing" methods simply do
not suffer from the pathology you imply.

> The test for primality is interesting.

Very interesting, though I know little about it.

> > The only absolutely sure way
> > of determining whether a number is prime, as far as is known, is to

The largest *certainly* known non-Marsenne prime is
19249*2^13018586 + 1
a number with almost 4,000,000 digits.
I've no idea how its primality was proved,
but if they...

> > attempt to divide by all prime numbers below or equal to its square
> > root.

... they would have had to perform about 10^2000
long divisions (very long!). Assuming they'd
recruited a googol monkeys, and each monkey
could do a googol divisions per year, they'd
need 10^1800 years to prove primality your way!

I'm no primality expert, but it took me only a
few seconds at Wikipedia to discover:

> The first deterministic primality test significantly
> faster than the naïve methods was the cyclotomy test;
> its runtime can be proven to be
> O((log n) ^ clog log log n),
> where n is the number to test for primality and
> c is a constant independent of n.

Wikipedia (and Google) is your friend! The bad rep
Wikipedia has in recent media reports does not apply,
I think, to most science pages.

I didn't invent Cuckoo Hash (wish I had), learned
of it after Googling in a dialog with another computer
scientist (who hadn't heard of Cuckoo either, even
though he wrote papers on Hash Tables!)

I don't think I'm the only one whose posting
quality would go up if time wasted on Usenet
invective were spent instead on research.

James Dow Allen

Richard Tobin

unread,

Aug 12, 2007, 4:47:47 AM8/12/07

to

In article <APydnUqqh5j...@bt.com>,
Malcolm McLean <regn...@btinternet.com> wrote:

>You mean randomised primality testing? It's not absolutely sure.
>I am no mathematician, maybe you could tell us your method?

Google for "primality testing". There are deterministic tests faster
than testing all the primes < sqrt(n), and the probabilistic tests can
be used to reduce the probability as much as you like (for example, to
less than the chance of your computer being struck by lightning).

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Malcolm McLean

unread,

Aug 12, 2007, 5:34:03 AM8/12/07

to

"James Dow Allen" <jdall...@yahoo.com> wrote in message
news:1186908974.3...@x40g2000prg.googlegroups.com...

On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> "Ben Bacarisse" <ben.use...@bsb.me.uk> wrote ...
> >> I think your understanding of hash tables might need to be raised a
> >> notch.

>Hash tables are frequently discussed in this and "nearby"
>ngs, but it is clear that most programmers are unaware of
>innovations like Cuckoo Hash or "Compact Hash", etc.
>These aren't just theoretical curiosities but have important
>advantages over hash tables described in old textbooks.

>> you cannot understand that a hash table (to be pedantic,
>> one without "perfect hashing") cannot be more than
>> about 88% full before the algorithm becomes pathological.

>Cuckoo hash, just to give one example, can operate
>well above 88%. ("Pathology" is a matter of semantics
>since many methods can encounter O(n) behavior even
>at low utilization, with non-zero probability,
>due to very bad "luck".) Cuckoo hash, BTW, has
>another very important space-savings advantage over
>ordinary open-address hashing. I'll leave the
> identification of this advantage as an exercise.... :-)
>
>And popular "chained hashing" methods simply do
>not suffer from the pathology you imply.
>

The book is Basic Algorithms. I am aware that there are more sophisticated
methods of hashing, though I am not an expert in them.
However someone who doesn't understand that a simple hash table breaks down
when it becomes too full obviously needs their understanding even of the
basic algorithm raised a notch, as I said.

>
>> The test for primality is interesting.
>
> Very interesting, though I know little about it.
>

There are certain advantages in making the hash table length prime. However
it is a chapter on hash tables, not prime numbers. So I present a naive test
but mention that there are better ones.
In fact I was wrong about the naive test being the only deterministic one.
Sorry about that. There's no chapter on number theory, in case you are
wondering.

webs...@gmail.com

unread,

Aug 12, 2007, 6:11:13 AM8/12/07

to

On Aug 11, 4:29 am, santosh <santosh....@gmail.com> wrote:

> ravi wrote:
> > can anybody tell me that which ds will be best suited to implement a
> > hash table in C/C++
>

> The hash table is itself a data structure. It works in conjunction with
> hash function.

It sounds like he is asking what *other* data structures could be used
to implement it. Reallocatable arrays, pointers and structs seems
like the best choices to me. :) Yeah, I know, that's basically most
of what C has, but it seems they are all necessary to solve the
problem anyways. He also asked about C++, in which case you might
like to use a vector instead of a reallocatable array, but that's
apparently off topic here anyways.

The real magic of a hash table, of course, is the internal mapping
structure, the hash function and the operating methods that you decide
to allow for the hash table. But he didn't ask about that.

--
Paul Hsieh
http://www.pobox.com/~qed/hash.html
http://bstring.sf.net/

Eric Sosman

unread,

Aug 12, 2007, 9:09:46 AM8/12/07

to

Malcolm McLean wrote:
>
> Obviously if you say things like "the function will not return at all
> when the table is full" you cannot understand that a hash table (to be
> pedantic, one without "perfect hashing") cannot be more than about 88%
> full before the algorithm becomes pathological.

Nonsense. Utter uninformed nonsense. "Hash table" covers
a wide variety of data structures and associated algorithms,
with different characteristics. There's an excellent chapter
in TAOCP, and from the above quote there's no evidence that
you've understood any of it.

> I think you need to read the chapter again, as a pupil rather than as a
> critic.

I have subjected myself to about one and a half chapters
of your awful book already, and will not willingly read more.

> The normal thing when you an error such as
> dereference of a null is to say "your code derererences a null" rather
> than to talk airly about "catastropic behaviour" in an arrogant manner.

Is dereferencing a null pointer provably non-catastrophic?
Is it even "proBably" non-catastrophic? What guarantees apply
to undefined behavior that might limit its damage? What part
of "undefined" are you having trouble with? What part of "un-"
are you failing to grasp?

> I am no mathematician, [...]

I'd guessed as much.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Aug 12, 2007, 9:19:53 AM8/12/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> As for the derefencing null pointer, that will happen if the
> constructor fails, and return NULL. That is standard behaviour
> throughout the book. If objects cannot be created, the constructing
> function returns a null pointer to show they have failed.
> I couldn't see another. The code has been tested, but only on two
> systems (a UNIX mainframe and Windows) so that doesn't mean no errors
> remain, and of course typographical errors do creep in in the process
> of reformatting for print - I wish I had a tool to format source code
> automatically but I don't. The normal thing when you an error such as
> dereference of a null is to say "your code derererences a null" rather
> than to talk airly about "catastropic behaviour" in an arrogant
> manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

One reason I have got so worked up about "details" is that your post
was in c.l.c. The things that I object to most strongly in the book
have nothing to do with C and so a lot of my objections get magnified
through those that *are* topical here. I will try to leave the
subject alone. I think we both know how the other feels about this
subject.

>> Do you care, for example, that your statement:
>>
>> The test for primality is interesting. The only absolutely sure way
>> of determining whether a number is prime, as far as is known, is to
>> attempt to divide by all prime numbers below or equal to its square
>> root.
>>
>> is not true? Probably not. I will try, in future, to leave you alone.
>>
> You mean randomised primality testing? It's not absolutely sure.

No. I mean simply that it is not a true statement. There are other
non-probabilistic methods for testing for primality.

> I am no mathematician, maybe you could tell us your method?

None is "my" method. They are all easy to find via a quick web
search. The two most famous are the recent Agrawal, Kayal and Saxena
algorithm (that finally proved that primality testing was in P as had
been suspected) and the much older (and currently the fastest) method
based on the work of Adleman, Pomerance and Rumely, improved by Cohen
and Lenstra (this later is sometimes called the APR-CL test).

[1] By "clear error" I mean things that are not a matter of style or
pedagogy. My list is:

(1) redundant (i.e. provably true) if (ptr > str) in strtrim

(2) array[y*10=x];

(3) if (8str == '}') in checkbalanced

(4) incorrect quotes on some char constants in the above function
(only visible in the book version)

(5) int expy;; in floating add (not an error, just an obvious typo).

(6) if(x & 0x7FFFFFFFF < y & 0x7FFFFFFF) in same (two errors here).

(7) if(tail == 100] in add (for a queue)

(8) no test for queue being full or queue being empty

(9) if(ptr == room->baddies) room->baddies = ptr; in removebaddy

(10) missing ";" after assert( x > 0.0) squareroot

(11) null pointer dereference in fixedallocate

(12) The statement above about prime testing.

--
Ben.

Mark McIntyre

unread,

Aug 12, 2007, 9:26:36 AM8/12/07

to

On Sun, 12 Aug 2007 09:09:46 -0400, in comp.lang.c , Eric Sosman
<eso...@ieee-dot-org.invalid> wrote:

>Malcolm McLean wrote:
>>
>> The normal thing when you an error such as
>> dereference of a null is to say "your code derererences a null" rather
>> than to talk airly about "catastropic behaviour" in an arrogant manner.
>
> Is dereferencing a null pointer provably non-catastrophic?

I guess the point is, when you're commenting on something its
generally considered more productive to use non-pejorative language if
you want to cause the writer to change. Obviously if your only
intention is to highlight the error and complain about the author, the
other method works fine...
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Malcolm McLean

unread,

Aug 12, 2007, 2:22:31 PM8/12/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:87lkcgl...@bsb.me.uk...

> "Malcolm McLean" <regn...@btinternet.com> writes:
>
>> As for the derefencing null pointer, that will happen if the
>> constructor fails, and return NULL. That is standard behaviour
>> throughout the book. If objects cannot be created, the constructing
>> function returns a null pointer to show they have failed.
>> I couldn't see another. The code has been tested, but only on two
>> systems (a UNIX mainframe and Windows) so that doesn't mean no errors
>> remain, and of course typographical errors do creep in in the process
>> of reformatting for print - I wish I had a tool to format source code
>> automatically but I don't. The normal thing when you an error such as
>> dereference of a null is to say "your code derererences a null" rather
>> than to talk airly about "catastropic behaviour" in an arrogant
>> manner.
>
> I am sorry you do not like my tone. I have certainly got worked up
> about this subject. Your fixed allocator has an error. The code
> cannot return a null pointer without causing undefined behaviour.
> Calm enough? You decide how seriously you categorise it.
>

Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
Why not say that instead of all that vague talk about catastrophic behaviour
when the hash table gets full?
(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes, but
at least admit that you are wrong. Otherwise people wonder if you really
understand hash tables at all.)

Flash Gordon

unread,

Aug 12, 2007, 3:04:00 PM8/12/07

to

Malcolm McLean wrote, On 12/08/07 19:22:

>
> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87lkcgl...@bsb.me.uk...
>> "Malcolm McLean" <regn...@btinternet.com> writes:
>>
>>> As for the derefencing null pointer, that will happen if the
>>> constructor fails, and return NULL. That is standard behaviour
>>> throughout the book. If objects cannot be created, the constructing
>>> function returns a null pointer to show they have failed.
>>> I couldn't see another. The code has been tested, but only on two
>>> systems (a UNIX mainframe and Windows) so that doesn't mean no errors
>>> remain, and of course typographical errors do creep in in the process
>>> of reformatting for print - I wish I had a tool to format source code
>>> automatically but I don't. The normal thing when you an error such as
>>> dereference of a null is to say "your code derererences a null" rather
>>> than to talk airly about "catastropic behaviour" in an arrogant
>>> manner.
>>
>> I am sorry you do not like my tone. I have certainly got worked up
>> about this subject. Your fixed allocator has an error. The code
>> cannot return a null pointer without causing undefined behaviour.
>> Calm enough? You decide how seriously you categorise it.
>>
> Yes, that is another bug.
> fixedallocate should have an if(answer) before setting top to answer->next.

So fix it and all the other reported bugs. You should also add a change
record to the book so that people know what you have fixed.

> Why not say that instead of all that vague talk about catastrophic
> behaviour when the hash table gets full?

Perhaps he thought you should actually test the code in your book when
you are charging for it? Proper testing would have included filling the
hash table.

> (The other bug complaint about the code getting stuck in a loop when the
> table gets full is false. Which I don't mind, we can all make mistakes,
> but at least admit that you are wrong. Otherwise people wonder if you
> really understand hash tables at all.)

Actually, I would say it shows that your implementation makes it harder
for people to see how the algorithm works. A decent teaching
implementation of a hash would, IMHO, include the check rather than
relying on how another non-standard (although provided) function behaves.
--
Flash Gordon

Malcolm McLean

unread,

Aug 12, 2007, 3:24:08 PM8/12/07

to

"Flash Gordon" <sp...@flash-gordon.me.uk> wrote in message
news:3624p4x...@news.flash-gordon.me.uk...

I'm inclined to agree. You pass in a capacity, but it would be a lot clearer
if the function explicitly rejected attempts to fill the table beyond that
capacity.

Ben Bacarisse

unread,

Aug 12, 2007, 8:53:16 PM8/12/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87lkcgl...@bsb.me.uk...
>> "Malcolm McLean" <regn...@btinternet.com> writes:
>>
>>> As for the derefencing null pointer, that will happen if the
>>> constructor fails, and return NULL. That is standard behaviour
>>> throughout the book. If objects cannot be created, the constructing
>>> function returns a null pointer to show they have failed.
>>> I couldn't see another. The code has been tested, but only on two
>>> systems (a UNIX mainframe and Windows) so that doesn't mean no errors
>>> remain, and of course typographical errors do creep in in the process
>>> of reformatting for print - I wish I had a tool to format source code
>>> automatically but I don't. The normal thing when you an error such as
>>> dereference of a null is to say "your code derererences a null" rather
>>> than to talk airly about "catastropic behaviour" in an arrogant
>>> manner.
>>
>> I am sorry you do not like my tone. I have certainly got worked up
>> about this subject. Your fixed allocator has an error. The code
>> cannot return a null pointer without causing undefined behaviour.
>> Calm enough? You decide how seriously you categorise it.
>>
> Yes, that is another bug.
> fixedallocate should have an if(answer) before setting top to answer->next.
> Why not say that instead of all that vague talk about catastrophic
> behaviour when the hash table gets full?

I am sorry that my report was not clear. I am not reacting to this
topic rationally. I find the idea if charging money for a book of
this quality hard to accept, and my anger at that has affected the
clarity of my postings.

> (The other bug complaint about the code getting stuck in a loop when
> the table gets full is false. Which I don't mind, we can all make
> mistakes, but at least admit that you are wrong. Otherwise people
> wonder if you really understand hash tables at all.)

Yes, that was my mistake. I don't think it shows your design in a
good light, however. It helps to follow a design if one can think of
the pieces separately, and the correctness of your hash functions all
rely on the allocator failing at the right point. A change to the
code to replace the allocator would break it in a mysterious way.
This kind linkage between components is not desirable and it is
particularly odd to have used it in a teaching book.

Will you accept that readers should get a refund when they see the
code you present for implementing a queue as a circular buffer?

--
Ben.

Richard Heathfield

unread,

Aug 13, 2007, 12:58:56 AM8/13/07

to

Malcolm McLean said:

>
> "ravi" <dcerav...@gmail.com> wrote in message
> news:1186820007.2...@i13g2000prf.googlegroups.com...
>> Hi
>> can anybody tell me that which ds will be best suited to implement a
>> hash table in C/C++
>>
>> thanx. in advanced
>>
> Which ds?
> You can read all about hash tables in my book, Basic Algorithms. The
> hash tables chapter is free.

Please don't do a navia on us, Malcolm. If your book is good enough,
other people here will recommend it when appropriate. And if it isn't
good enough, nobody here should be recommending it, least of all you.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Richard Heathfield

unread,

Aug 13, 2007, 1:09:06 AM8/13/07

to

CBFalconer said:

> ravi wrote:
>>
>> can anybody tell me that which ds will be best suited to implement
>> a hash table in C/C++
>
> I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language he's
writing in before he starts worrying about difficult stuff.

> However, a complete, and
> portable, hash table implementation under GPL

...does not answer the question, any more than "what's the best kind of
honey?" is answered by "here, have one of my honey sandwiches".

Richard Heathfield

unread,

Aug 13, 2007, 2:11:47 AM8/13/07

to

Malcolm McLean said:

<snip>

> There are certain advantages in making the hash table length prime.

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage". If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

A book that covers hash tables ought, at the very least, to mention
quadratic probing, and demonstrate how it can get into a cycle with
composite hash table sizes.

For example, consider a hash table with ten elements, of which elements
0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and you
will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately, in
advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

pete

unread,

Aug 13, 2007, 6:37:57 AM8/13/07

to

Richard Heathfield wrote:
>
> CBFalconer said:
>
> > ravi wrote:
> >>
> >> can anybody tell me that which ds will be best suited to implement
> >> a hash table in C/C++
> >
> > I am not sure what your question is.
>
> It seems pretty clear to me. But he needs to decide what language he's
> writing in before he starts worrying about difficult stuff.

Does "ds" really mean "difficult stuff"?

--
pete

Richard Heathfield

unread,

Aug 13, 2007, 6:51:07 AM8/13/07

to

pete said:

<grin> Pure coincidence, I'm afraid.

If he decides on C++, his answer is probably to be found in the STL. If
he chooses C, he'll just want to slap an array of objects together if
he's using linear or quadratic probing, or an array of collections
otherwise (where "collection" means something like a linked list, a
binary search tree, or maybe even another hash table).

Frodo Baggins

unread,

Aug 13, 2007, 1:41:33 PM8/13/07

to

On Aug 12, 6:19 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> "Malcolm McLean" <regniz...@btinternet.com> writes:
> > As for the derefencing null pointer, that will happen if the
> > constructor fails, and return NULL. That is standard behaviour
> > throughout the book. If objects cannot be created, the constructing
> > function returns a null pointer to show they have failed.
> > I couldn't see another. The code has been tested, but only on two
> > systems (a UNIX mainframe and Windows) so that doesn't mean no errors
> > remain, and of course typographical errors do creep in in the process
> > of reformatting for print - I wish I had a tool to format source code
> > automatically but I don't. The normal thing when you an error such as
> > dereference of a null is to say "your code derererences a null" rather
> > than to talk airly about "catastropic behaviour" in an arrogant
> > manner.
>
> I am sorry you do not like my tone. I have certainly got worked up
> about this subject. Your fixed allocator has an error. The code
> cannot return a null pointer without causing undefined behaviour.
> Calm enough? You decide how seriously you categorise it.
>
> I think you should fix that and the other clear errors[1] that have been
> pointed out before you call other people arrogant. You say in the
> introduction that bugs can be "corrected very quickly", but you don't
> seem to have corrected anything.

I would suggest using splint (http://www.splint.org/) if you haven't
done so already.

Regards,
Frodo B

Ivar Rosquist

unread,

Aug 13, 2007, 2:27:06 PM8/13/07

to

On Sat, 11 Aug 2007 09:29:09 +0100, Malcolm McLean wrote:

> "ravi" <dcerav...@gmail.com> wrote in message
> news:1186820007.2...@i13g2000prf.googlegroups.com...
>> Hi

>> can anybody tell me that which ds will be best suited to implement a
>> hash table in C/C++
>>

>> thanx. in advanced
>>
> Which ds?
> You can read all about hash tables in my book, Basic Algorithms. The
> hash tables chapter is free.

Don't bother - that book probably is the buggiest book of the
year.

Malcolm McLean

unread,

Aug 13, 2007, 4:40:52 PM8/13/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:878x8gl...@bsb.me.uk...

>
>> Yes, that is another bug.
>> fixedallocate should have an if(answer) before setting top to answer-
>> next.
>> Why not say that instead of all that vague talk about catastrophic
>> behaviour when the hash table gets full?
>
> I am sorry that my report was not clear. I am not reacting to this
> topic rationally. I find the idea if charging money for a book of
> this quality hard to accept, and my anger at that has affected the
> clarity of my postings.
>

It's surprising how often people react like that. I get similar accusations
all the time about 12 Common Atheist Arguments (refuted). There of course
there can't be a technical justification.

>
>> (The other bug complaint about the code getting stuck in a loop when
>> the table gets full is false. Which I don't mind, we can all make
>> mistakes, but at least admit that you are wrong. Otherwise people
>> wonder if you really understand hash tables at all.)
>
> Yes, that was my mistake. I don't think it shows your design in a
> good light, however. It helps to follow a design if one can think of
> the pieces separately, and the correctness of your hash functions all
> rely on the allocator failing at the right point. A change to the
> code to replace the allocator would break it in a mysterious way.
> This kind linkage between components is not desirable and it is
> particularly odd to have used it in a teaching book.
>

I think it is clear that the code does need a test against capacity, and
some documentation. So somethign worthwhile has come of all this.

>
> Will you accept that readers should get a refund when they see the
> code you present for implementing a queue as a circular buffer?
>

Maybe it's just me incapable of seeing the problems with my own code, but
what is wrong with a circular buffer? Sure, it doesn't grown dynamically,
but generally you want to ration queue length anyway. If it consistently has
more coming in than you're taking out, there's probably something wrong with
your logic, or the machine is underpowered.
When edition 2 comes out I might give courtesy copies to edition 1 readers
who point out bugs. Maybe even to clc reviewers who have been kind enough to
point out problems. I'm not ungrateful, though sometimes I get a little bit
annoyed at the tone. The snag is that most of the price represents printing
costs, compartively little goes towards my luxury yacht.

Malcolm McLean

unread,

Aug 13, 2007, 5:04:16 PM8/13/07

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message
news:dYudnaLT440OaCLb...@bt.com...

> Malcolm McLean said:
>
> <snip>
>
>> There are certain advantages in making the hash table length prime.
>
> If you use quadratic probing as your collision avoidance mechanism, a
> prime hash table length would seem to me to be essential, not just a
> "certain advantage". If, on the other hand, you treat the hash table as
> an array of collections of some kind, then I see no advantage at all to
> the hash table length being prime.
>
> A book that covers hash tables ought, at the very least, to mention
> quadratic probing, and demonstrate how it can get into a cycle with
> composite hash table sizes.
>
> For example, consider a hash table with ten elements, of which elements
> 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
> empty, so the table is only 60% full).
>
> Now try to add an element at index 4, using quadratic probing, and you
> will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
> 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.
>
> Using a prime number guarantees that you won't get into this cycle
> unless the table is actually full (which you can check separately, in
> advance).
>
> Primes aren't just an advantage for quadratic probing - they're
> essential.
>

The hash table chapter includes an example of quadratic probing, which does
as you say.
Maybe the primality test should be more rigorous? The worst code is that
which seems to work, then fails when someone includes it in a life-support
system.

Mark McIntyre

unread,

Aug 13, 2007, 6:51:19 PM8/13/07

to

On Mon, 13 Aug 2007 21:40:52 +0100, in comp.lang.c , "Malcolm McLean"
<regn...@btinternet.com> wrote:

>"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>news:878x8gl...@bsb.me.uk...
>>

>> I am sorry that my report was not clear. I am not reacting to this
>> topic rationally.
>>

>It's surprising how often people react like that. I get similar accusations
>all the time about 12 Common Atheist Arguments (refuted).

<OT>
What, you mean as opposed to the irrationality of the book itself?
</ot>

Ben Bacarisse

unread,

Aug 13, 2007, 9:28:34 PM8/13/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:878x8gl...@bsb.me.uk...
>>
>>> Yes, that is another bug.
>>> fixedallocate should have an if(answer) before setting top to answer-
>>> next.
>>> Why not say that instead of all that vague talk about catastrophic
>>> behaviour when the hash table gets full?
>>
>> I am sorry that my report was not clear. I am not reacting to this
>> topic rationally. I find the idea if charging money for a book of
>> this quality hard to accept, and my anger at that has affected the
>> clarity of my postings.
>>
> It's surprising how often people react like that. I get similar
> accusations all the time about 12 Common Atheist Arguments
> (refuted).

Do you push that book in groups where only a critique of the grammar
is on topic? If so, then I can see the similarity. If not, then you
missed my point. I am trying very hard to be topical for c.l.c. That
explains, I think, what some have seen as nitpicking. Post in
comp.programming or comp.theory if you want to see some comments on
your book as a whole.

> There of course there can't be a technical justification.

You have a strange sense of humour (I assume that was a joke?).

<snip>

>> Will you accept that readers should get a refund when they see the
>> code you present for implementing a queue as a circular buffer?
>>
> Maybe it's just me incapable of seeing the problems with my own code,
> but what is wrong with a circular buffer? Sure, it doesn't grown
> dynamically, but generally you want to ration queue length anyway. If
> it consistently has more coming in than you're taking out, there's
> probably something wrong with your logic, or the machine is
> underpowered.

<stay clam>...

No. You are thinking of one kind of queue (where you do want to
balance input rate and output rate) and you've generalised from that.
There are uses of queues where your last statement is simply wrong.
For example, in a breadth first search of a tree using a queue,
continued growth of the queue simply indicates something about the
shape of the tree, not an underpowered machine or faulty logic.

Of course, limiting the queue size is often done, and a circular
buffer is an excellent way to implement a bounded queue. My objection
was that the code you present does not implement a bounded queue.
What you wrote

int queue[100];
int head;
int tail;

void add(int x)
{
if(tail == 100] /* sic */
tail = 0;
queue[tail++] = x;
}

int remove(void)
{
int answer = queue[head];
head++;
if(head == 100)
head = 0;
return answer;
}

is just a circular buffer. A bounded queue would have 'full' and
'empty' tests. The add operation would fail if the queue were full,
and the remove operation (badly named, as has been pointed out) would
fail if it were empty.

What algorithms would your queue implementation be suitable for?

--
Ben.

Richard Heathfield

unread,

Aug 14, 2007, 12:09:30 AM8/14/07

to

Malcolm McLean said:
> "Richard Heathfield" wrote...
<snip>

>>
>> Primes aren't just an advantage for quadratic probing - they're
>> essential.
>>
> The hash table chapter includes an example of quadratic probing, which
> does as you say.

Well, I didn't buy the book, so I wouldn't know.

> Maybe the primality test should be more rigorous? The worst code is
> that which seems to work, then fails when someone includes it in a
> life-support system.

The primality test should be *definitive*. There's no excuse for it not
to be. It took me less than 30 seconds - on a relatively ancient
machine - to generate all of the 5761455 primes in the range 1 to
100,000,000. And less than two seconds to generate all the 664579
primes in the range 1 to 10,000,000.

Just how big a hash table did your users want? If it has more than a
hundred million entries, I suspect that the cost of definitive
primality testing will be the least of your problems.

James Dow Allen

unread,

Aug 14, 2007, 3:58:25 AM8/14/07

to

On Aug 12, 6:11 pm, Richard Heathfield <r...@see.sig.invalid> wrote:

> Malcolm McLean said:
> > There are certain advantages in making the hash table length prime.
>
> If you use quadratic probing as your collision avoidance mechanism, a
> prime hash table length would seem to me to be essential, not just a
> "certain advantage".

A robust quadratic prober will switch to
linear probing after "a while" (since it
would otherwise be able to access only half
the indexes). For this reason, Richard's
comment would be more robust if "essential"
is replaced with "very highly desireable."

> If, on the other hand, you treat the hash table as
> an array of collections of some kind, then I see no advantage at all to
> the hash table length being prime.

I think there is a good practical reason to
use prime size, unrelated to quadratic probing.

A hash function needs to accomplish two
things: scramble, and reduce to range {0,1,...,N-1}
where N is table size. Two birds can be
(partially) killed with one stone when the last
step is taking remainder (index %= N), but this
scrambles best when N is prime.

(Flamers will be happy to note that we "should"
have scrambled well first but, unless we're
extremely confident in our hash function, why
not get the scrambling benefit of prime-number
remaindering?)

Another interesting issue, related to hashing
though perhaps unrelated to table size, concerns
reversible hash functions ("scrambling bijections").
These are easy to implement on {0,1,2,...,N-1}
*if* N is prime.

James Dow Allen

Richard Heathfield

unread,

Aug 14, 2007, 4:50:48 AM8/14/07

to

James Dow Allen said:

<snip>

> A robust quadratic prober will switch to
> linear probing after "a while" (since it
> would otherwise be able to access only half
> the indexes). For this reason, Richard's
> comment would be more robust if "essential"
> is replaced with "very highly desireable."

I'll buy that. Here's your dollar.

<snip>

> I think there is a good practical reason to
> use prime size, unrelated to quadratic probing.
>
> A hash function needs to accomplish two
> things: scramble, and reduce to range {0,1,...,N-1}
> where N is table size. Two birds can be
> (partially) killed with one stone when the last
> step is taking remainder (index %= N), but this
> scrambles best when N is prime.

Two bucks in one day? Hmmm. I'm not talking to you any more without my
lawyer.

But seriously, there are also advantages in making the table size an
integer power of 2 (which, in all but one case, fails the primality
test with a vengeance), since the hash value can then be reduced to the
table size more rapidly.

Measure, measure, measure!

Army1987

unread,

Aug 14, 2007, 5:38:51 AM8/14/07

to

I wouldn't. Many of the warning it finds are bogus. It claims that
in long k = 1; printf("%g\n", k / 2.0) the second argument to
printf() has a wrong type. It complains that the controlling
expression of if(!ferror(stdout)) isn't Boolean, and suggests me
a way to disable warnings when pointers are used as controlling
expressions. It complains about "assigning a character to an int"
when I write i = '\n', but then it remembers "anyway it's safe,
since actually '\n' has type int". If I turn up the warning level
enough, it will also complain about the signed operands in (1<<8)
in the macro expansion of isspace(x) found in ctype.h.
Or better, use it, but do not take what it says too seriously.
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

CBFalconer

unread,

Aug 13, 2007, 2:33:05 PM8/13/07

to

Malcolm McLean wrote:
> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>

... snip ...

>
>> I am sorry you do not like my tone. I have certainly got worked up
>> about this subject. Your fixed allocator has an error. The code
>> cannot return a null pointer without causing undefined behaviour.
>> Calm enough? You decide how seriously you categorise it.
>
> Yes, that is another bug. fixedallocate should have an if(answer)
> before setting top to answer->next. Why not say that instead of
> all that vague talk about catastrophic behaviour when the hash
> table gets full? (The other bug complaint about the code getting
> stuck in a loop when the table gets full is false. Which I don't
> mind, we can all make mistakes, but at least admit that you are
> wrong. Otherwise people wonder if you really understand hash
> tables at all.)

I haven't looked at your code, but it appears to be bug infested
from the comments here. Take a look at my hashlib package, which
has been out for some years, and I have had zero bug reports since
the first revision (and that was for a minor memory loss on
closing). Very stable, and purely standard C content. See:

<http://cbfalconer.home.att.net/download>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

CBFalconer

unread,

Aug 13, 2007, 2:38:37 PM8/13/07

to

Richard Heathfield wrote:
> CBFalconer said:
>> ravi wrote:
>>>
>>> can anybody tell me that which ds will be best suited to implement
>>> a hash table in C/C++
>>
>> I am not sure what your question is.
>
> It seems pretty clear to me. But he needs to decide what language
> he's writing in before he starts worrying about difficult stuff.
>
>> However, a complete, and
>> portable, hash table implementation under GPL
>
> ...does not answer the question, any more than "what's the best kind
> of honey?" is answered by "here, have one of my honey sandwiches".

What question? What's a 'ds'? What's "C/C++"? In c.l.c I think
we can assume the language is intended to be C.

CBFalconer

unread,

Aug 13, 2007, 2:24:54 PM8/13/07

to

Richard Heathfield wrote:
>
... snip ...

>
> For example, consider a hash table with ten elements, of which
> elements 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2,
> 6 and 7 empty, so the table is only 60% full).
>
> Now try to add an element at index 4, using quadratic probing, and
> you will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8,
> 5, 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.
>
> Using a prime number guarantees that you won't get into this cycle
> unless the table is actually full (which you can check separately,
> in advance).
>
> Primes aren't just an advantage for quadratic probing - they're
> essential.

That's a nice, compact, and clear explanation. I tried explaining
this to someone some time ago (1 year or more) without success. He
continued to insist that non-prime sizes were suitable.

Richard Heathfield

unread,

Aug 14, 2007, 1:53:43 PM8/14/07

to

CBFalconer said:

> Richard Heathfield wrote:
>> CBFalconer said:

<snip>

>>> However, a complete, and
>>> portable, hash table implementation under GPL
>>
>> ...does not answer the question, any more than "what's the best kind
>> of honey?" is answered by "here, have one of my honey sandwiches".
>
> What question?

It seems that he wishes to know what primitive data structures would be
most suitable for implementing a hash table. Elsethread, I have
answered this question already, so I won't repeat the answer here.

> What's a 'ds'?

The consensus appears to be that he means "data structure".

> What's "C/C++"? In c.l.c I think
> we can assume the language is intended to be C.

You have answered your own question. :-)

user923005

unread,

Aug 14, 2007, 3:01:47 PM8/14/07

to

On Aug 13, 11:24 am, CBFalconer <cbfalco...@yahoo.com> wrote:
> Richard Heathfield wrote:
>
> ... snip ...
>
> > For example, consider a hash table with ten elements, of which
> > elements 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2,
> > 6 and 7 empty, so the table is only 60% full).
>
> > Now try to add an element at index 4, using quadratic probing, and
> > you will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8,
> > 5, 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.
>
> > Using a prime number guarantees that you won't get into this cycle
> > unless the table is actually full (which you can check separately,
> > in advance).
>
> > Primes aren't just an advantage for quadratic probing - they're
> > essential.
>
> That's a nice, compact, and clear explanation. I tried explaining
> this to someone some time ago (1 year or more) without success. He
> continued to insist that non-prime sizes were suitable.

This assumes that quadratic probing is the best overflow handler.

I have found that hash tables which are an integral power of 2 in size
have definite advantages in some situations.
To calculate the appropriate address you can simply do this
(simplified example):

struct hashtab hash_table[1u << 24];
unsigned long long mask = (1u << 24)-1;

...

struct hashtab *entry = &hash_table[current_hash & mask];

It comes up a lot in game programming because we don't worry about
overflow. If a new entry looks better than an old one, we simply
overwrite it.

And when we need to store collision entries we can either roll the egg
out of the nest (cuckoo), or use hash tables of skiplists.

Of course, every different scheme has some advantages and
disadvantages. I did not read the book in question, but it sounds
like it could have used better editing.

Eric Sosman

unread,

Aug 14, 2007, 3:36:18 PM8/14/07

to

user923005 wrote On 08/14/07 15:01,:
> [...]

>
> I have found that hash tables which are an integral power of 2 in size
> have definite advantages in some situations.
> To calculate the appropriate address you can simply do this
> (simplified example):
>
> struct hashtab hash_table[1u << 24];
> unsigned long long mask = (1u << 24)-1;

Just a couple side-issues, not really related to hash
tables as such:

1) On a system whose ints have 16,17,...,24 bits, the
expression `1u << 24' yields undefined behavior,
not the intended 16777216. Suggestion: Change
`1u' to `1uL' in both places, or write 0x1000000
instead.

2) The range of `unsigned long long' vastly exceeds
the desired mask value of 16777215, and its use
may impose performance penalties. (Even on systems
with 64-bit hardware, you may wind up using two
CPU registers instead of just one.) Suggestion:
Change `unsigned long long' to `unsigned long', or
even to `unsigned int' if UINT_MAX suffices.

(I'm sure user923005 is aware of these matters, but
less-experienced readers of his "simplified example" may
not be.)

--
Eric....@sun.com

Malcolm McLean

unread,

Aug 14, 2007, 3:50:35 PM8/14/07

to

"CBFalconer" <cbfal...@yahoo.com> wrote in message
news:46C0A3E1...@yahoo.com...

> Malcolm McLean wrote:
> I haven't looked at your code, but it appears to be bug infested
> from the comments here. Take a look at my hashlib package, which
> has been out for some years, and I have had zero bug reports since
> the first revision (and that was for a minor memory loss on
> closing). Very stable, and purely standard C content. See:
>

There are a few glitches which will come out in edition 2.

There are also some design issues, like the avoidance of size_t.

The code is meant to illustrate the algorithm, so that people who don't know
what a hash table is can see how one works. It is not meant to be an
industrial-strength application. If you want a drop-in hashtable, I'd
recommend your package over mine. They are not meant to do the same job.

Malcolm McLean

unread,

Aug 14, 2007, 3:57:50 PM8/14/07

to

"Mark McIntyre" <markmc...@spamcop.net> wrote in message
news:k1o1c39nq7kp6f4a4...@4ax.com...

> On Mon, 13 Aug 2007 21:40:52 +0100, in comp.lang.c , "Malcolm McLean"
> <regn...@btinternet.com> wrote:
>
>>"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>>news:878x8gl...@bsb.me.uk...
>>>
>>> I am sorry that my report was not clear. I am not reacting to this
>>> topic rationally.
>>>
>>It's surprising how often people react like that. I get similar
>>accusations
>>all the time about 12 Common Atheist Arguments (refuted).
>
> <OT>
> What, you mean as opposed to the irrationality of the book itself?
> </ot>
>

As you might expect a lot of atheists are very hostile to my religious
books. What is interesting is how similar the response is to Basic
Algorithms. They are quite different subjects, and there is no reason why
someone who agrees with me on Christiamity should see eye to eye on
programming matters. But the things said are almost identical - I regularly
get demands to withdraw the book because it doesn't contain a definitive
proof of God's existence, for instance (it doesn't claim to, it refutes 12
Common Atheist Arguments, not the same thing as proving Christianity to be
true). In case of Basic Algorithms the pretext is technical, of course, but
I think the basic motive is the same. People see a book as something
socially unacceptable.

user923005

unread,

Aug 14, 2007, 4:42:05 PM8/14/07

to

Excellent observations.
Aside:
In the case of chess programs, 32 bit hash values are not large
enough.
If we compute 1 million nodes per second (not at all unusual) and
analyze for correspondence chess (e.g. 24 hour search = 86400 seconds)
there will be a huge number of collisions. With a 64 bit hash, we can
easily detect them.
We use the bottom n-bits to compute the hash, and then the top n-bits
are stored in the table as a key-check. Only if both the address and
key-check bits match do we conclude that we have found a match.
Because of the 64 squares on a chessboard, bitboard representations
are quite common and so most of the math is using 64 bit operations
anyway. Hence many modern chess program benefit mightily from 64 bit
operating systems and compilers.

Keith Thompson

unread,

Aug 14, 2007, 5:56:02 PM8/14/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:
[...]

> As you might expect a lot of atheists are very hostile to my religious
> books. What is interesting is how similar the response is to Basic
> Algorithms. They are quite different subjects, and there is no reason
> why someone who agrees with me on Christiamity should see eye to eye
> on programming matters. But the things said are almost identical - I
> regularly get demands to withdraw the book because it doesn't contain
> a definitive proof of God's existence, for instance (it doesn't claim
> to, it refutes 12 Common Atheist Arguments, not the same thing as
> proving Christianity to be true). In case of Basic Algorithms the
> pretext is technical, of course, but I think the basic motive is the
> same. People see a book as something socially unacceptable.

I think that's an extreme case of wishful thinking on your part.

I won't discuss your "12 Common Atheist Arguments" book here, both
because I haven't read it and because it's about as far off-topic as
anything I can imagine.

I don't believe that anybody has any objection to the idea of a book
on basic algorithms. There is absolutely nothing "socially
unacceptable" about such a book. People are objecting to the errors
in your book. If you had written a *good* book on basic algorithms,
nobody would complain.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Malcolm McLean

unread,

Aug 14, 2007, 6:25:16 PM8/14/07

to

"Keith Thompson" <ks...@mib.org> wrote in message

> I don't believe that anybody has any objection to the idea of a book
> on basic algorithms. There is absolutely nothing "socially
> unacceptable" about such a book. People are objecting to the errors
> in your book. If you had written a *good* book on basic algorithms,
> nobody would complain.
>

The objection is to a regular publishing such a book.
I have written a good book. Not a single objection has been non-trivial - a
few bugs, but dropped stitches rather than deep-seated problems, and
stylistic issues, plus a detail on deterministic algorithms for prime
numbers. However they have been magnified into an assertion that I am
"unqualified to write" a book on algorithms. The technical objections,
though real issues, are obviously a pretext for something a bit deeper.

Guys, I've done it.

Kelsey Bjarnason

unread,

Aug 14, 2007, 6:28:36 PM8/14/07

to

[snips]

On Tue, 14 Aug 2007 20:57:50 +0100, Malcolm McLean wrote:

> As you might expect a lot of atheists are very hostile to my religious
> books.

If they're on a par with your programming books, no wonder.

> What is interesting is how similar the response is to Basic
> Algorithms. They are quite different subjects, and there is no reason why
> someone who agrees with me on Christiamity should see eye to eye on
> programming matters.

Er... the two have bugger all to do with each other. I don't care if the
coder next to me is Christian, Wiccan, Atheist or what; I care whether I
can work with him and whether his code is any good.

> Christianity to be true). In case of Basic Algorithms the pretext is
> technical, of course, but I think the basic motive is the same.

Rejecting bogus tripe? You're probably right.

user923005

unread,

Aug 14, 2007, 6:44:44 PM8/14/07

to

On Aug 14, 3:25 pm, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> "Keith Thompson" <ks...@mib.org> wrote in message
> > I don't believe that anybody has any objection to the idea of a book
> > on basic algorithms. There is absolutely nothing "socially
> > unacceptable" about such a book. People are objecting to the errors
> > in your book. If you had written a *good* book on basic algorithms,
> > nobody would complain.
>
> The objection is to a regular publishing such a book.

I know of regulars posting books before, and there was not a lot of
flack generated.

> I have written a good book. Not a single objection has been non-trivial - a
> few bugs, but dropped stitches rather than deep-seated problems, and
> stylistic issues, plus a detail on deterministic algorithms for prime
> numbers. However they have been magnified into an assertion that I am
> "unqualified to write" a book on algorithms. The technical objections,
> though real issues, are obviously a pretext for something a bit deeper.
>
> Guys, I've done it.

I suggest Knuth's method:
Offer a reward of (say) $1 for every mistake found (but only the first
time the mistake is found).
That way, people will know that you are sincere in your desire for
correctness.

Alternatively, incorporate the feedback you receive to improve the
second edition.

I suggest that your editor was not technical enough to aid you in the
publishing process.
If the mistake list that I have seen is correct, then I guess you need
an editor with more technical ability.

Suggestion:
On your next pass {if you do another one} don't publish anything that
won't stand up to lint.
Also, where possible, do an exhaustive test of all possible inputs.
After all, if you are publishing a book, people are likely to use it
as a reference.

I am thinking about writing a book about hybrid algorithms.
If I can ever find the time.

Ben Bacarisse

unread,

Aug 14, 2007, 6:55:54 PM8/14/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Keith Thompson" <ks...@mib.org> wrote in message
>> I don't believe that anybody has any objection to the idea of a book
>> on basic algorithms. There is absolutely nothing "socially
>> unacceptable" about such a book. People are objecting to the errors
>> in your book. If you had written a *good* book on basic algorithms,
>> nobody would complain.
>>
> The objection is to a regular publishing such a book.
> I have written a good book. Not a single objection has been
> non-trivial

I think a useless queue[1] is rather more than trivial. It is quite
misleading about what one can do with a queue, but even at that level
of detail it is off-topic here.

> - a few bugs, but dropped stitches rather than deep-seated
> problems, and stylistic issues, plus a detail on deterministic
> algorithms for prime numbers.

You continue to side-step the fact that your posted in c.l.c. Ask for
reviews in groups where either algorithms or teaching CS texts are
topical to see if anyone has deeper objections.

[1] Maybe "virtually useless" is better because I can think of one
rather specialised use for your non-queue.

--
Ben.

Ben Pfaff

unread,

Aug 14, 2007, 7:04:04 PM8/14/07

to

user923005 <dco...@connx.com> writes:

> I am thinking about writing a book about hybrid algorithms.
> If I can ever find the time.

I think that I could write an entire book about implementing
linked lists in C, along the lines of GNU libavl. Not sure that
anyone would read it though.
--
Ben Pfaff
http://benpfaff.org

user923005

unread,

Aug 14, 2007, 7:11:24 PM8/14/07

to

On Aug 14, 4:04 pm, Ben Pfaff <b...@cs.stanford.edu> wrote:

> user923005 <dcor...@connx.com> writes:
> > I am thinking about writing a book about hybrid algorithms.
> > If I can ever find the time.
>
> I think that I could write an entire book about implementing
> linked lists in C, along the lines of GNU libavl. Not sure that
> anyone would read it though.

I promise to buy a copy.

Keith Thompson

unread,

Aug 14, 2007, 7:17:59 PM8/14/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
>> I don't believe that anybody has any objection to the idea of a book
>> on basic algorithms. There is absolutely nothing "socially
>> unacceptable" about such a book. People are objecting to the errors
>> in your book. If you had written a *good* book on basic algorithms,
>> nobody would complain.
>>
> The objection is to a regular publishing such a book.

That's complete nonsense. Present one example of a book other than
yours being criticized because it was published by a regular.

Richard Heathfield is a co-author of a book on C. Why hasn't he
received the kind of abuse you've gotten here? Because his book
doesn't have the kind of blatant errors that yours does.

P.J. Plauger is a semi-regular. Even Dennis Ritchie posts here
occasionally. People have pointed out errors in their books, but
there just aren't all that many to point out.

> I have written a good book. Not a single objection has been
> non-trivial - a few bugs, but dropped stitches rather than deep-seated
> problems, and stylistic issues, plus a detail on deterministic
> algorithms for prime numbers. However they have been magnified into an
> assertion that I am "unqualified to write" a book on algorithms. The
> technical objections, though real issues, are obviously a pretext for
> something a bit deeper.
>
> Guys, I've done it.

People are criticizing your book because of its content. It has
nothing to do with the fact that you wrote it.

Well, that may not be entirely true. You're probably held to a higher
standard because, as a regular, we expect you to know C. But again,
the criticism is based on the actual content of the book. And if your
book had been published by an outsider, we probably wouldn't be
discussing it in the first place.

Eric Sosman

unread,

Aug 14, 2007, 10:08:37 PM8/14/07

to

Malcolm McLean wrote:
>
> "Keith Thompson" <ks...@mib.org> wrote in message
>> I don't believe that anybody has any objection to the idea of a book
>> on basic algorithms. There is absolutely nothing "socially
>> unacceptable" about such a book. People are objecting to the errors
>> in your book. If you had written a *good* book on basic algorithms,
>> nobody would complain.
>>
> The objection is to a regular publishing such a book.
> I have written a good book.

You have done no such thing. I have read good books,
and I know.

> Not a single objection has been non-trivial
> - a few bugs, but dropped stitches rather than deep-seated problems, and
> stylistic issues, plus a detail on deterministic algorithms for prime
> numbers. However they have been magnified into an assertion that I am
> "unqualified to write" a book on algorithms. The technical objections,
> though real issues, are obviously a pretext for something a bit deeper.
> Guys, I've done it.

Code that won't compile -- "trivial?"

Code that doesn't work -- "a detail?"

Code that's inefficient -- "stylistic?"

An author who can't even tell whether the objections are
wrong or right -- "qualified?"

When you "drop stitches" on a space suit, you deserve
to fly in it.

I admit to a revulsion towards your frightful book. But it's
no pretext for anything, deeper or shallower: your frightful book
merits opprobrium. The notion that anyone would accept money from
unsuspecting dupes in exchange for this piece of rot is repugnant.
And when the author moves from "I'll accept money" to "Buy my good
book!" as in this thread...! My revulsion is redoubled, my gorge
runneth over.

--
Eric Sosman
eso...@ieee-dot-org.invalid

CBFalconer

unread,

Aug 14, 2007, 11:12:26 PM8/14/07

to

Eric Sosman wrote:
> Malcolm McLean wrote:
>
... snip ...

>>
>> The objection is to a regular publishing such a book.
>> I have written a good book.
>
> You have done no such thing. I have read good books, and I know.

... snip all further elucidation ...

I suspect you need go no further to make an enemy.

Richard Heathfield

unread,

Aug 15, 2007, 4:39:32 AM8/15/07

to

Malcolm McLean said:

> "Mark McIntyre" <markmc...@spamcop.net> wrote in message
> news:k1o1c39nq7kp6f4a4...@4ax.com...
>> On Mon, 13 Aug 2007 21:40:52 +0100, in comp.lang.c , "Malcolm McLean"
>> <regn...@btinternet.com> wrote:
>>
>>>"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>>>news:878x8gl...@bsb.me.uk...
>>>>
>>>> I am sorry that my report was not clear. I am not reacting to this
>>>> topic rationally.
>>>>
>>>It's surprising how often people react like that. I get similar
>>>accusations
>>>all the time about 12 Common Atheist Arguments (refuted).
>>
>> <OT>
>> What, you mean as opposed to the irrationality of the book itself?
>> </ot>
>>
> As you might expect a lot of atheists are very hostile to my religious
> books.

Someone who claims that your Refutations book is irrational need not
necessarily be an atheist. I haven't read it myself, so I am commenting
only generally, but it seems to me that there are many possibilities:

Your book is rational, your critic is an atheist, your critic is wrong;
your book is irrational, your critic is an atheist, your critic is
right; your book is rational, your critic is of another faith, your
critic is wrong; your book is irrational, your critic is of another
faith, your critic is right; your book is rational, your critic is a
Christian, your critic is wrong; or your book is irrational, your
critic is a Christian, and your critic is right.

But these boil down to just two truly distinct possibilities - either
the book is irrational (in which case the criticism is correct) or it
is not (in which case the criticism is incorrect), and the faith or
otherwise of the critic is of no particular relevance.

> What is interesting is how similar the response is to Basic
> Algorithms. They are quite different subjects, and there is no reason
> why someone who agrees with me on Christiamity should see eye to eye
> on programming matters.

Indeed - and in fact other Christians might not even agree with you on
Christianity. Or they might. It's a broad church (if I may use that
expression in this context!).

> But the things said are almost identical - I
> regularly get demands to withdraw the book because it doesn't contain
> a definitive proof of God's existence, for instance

Neither does the Bible, but I don't see anyone clamouring for it to be
withdrawn.

> (it doesn't claim
> to, it refutes 12 Common Atheist Arguments, not the same thing as
> proving Christianity to be true).

Right. The book should be judged on its merits. If the refutations are
of poor quality, or are easily refuted themselves, or attack the wrong
arguments (e.g. arguments that are not commonly used by atheists), then
the book is a poor book. That doesn't mean it should be withdrawn,
however. The world needs horrible warnings just as much as it needs
good examples.

> In case of Basic Algorithms the
> pretext is technical, of course, but I think the basic motive is the
> same.

The motive is technical. If your purpose is to illustrate algorithms, I
suggest that you use either pseudocode or a language you know far
better than you know C.

> People see a book as something socially unacceptable.

Not in comp.lang.c they don't.

Very.Little.Gravitas.Indeed

unread,

Aug 15, 2007, 4:50:45 AM8/15/07

to

On Aug 15, 12:25 am, "Malcolm McLean" <regniz...@btinternet.com>
wrote:

> However they have been magnified into an assertion that I am
> "unqualified to write" a book on algorithms. The technical objections,
> though real issues, are obviously a pretext for something a bit deeper.

That was not what the assertion was, it was you are unqualified to use
C as a language to descibe the algorithms you are describing in your
books.

C is a strictly defined language which has been around for years, not
only has it standards it has conventions. yet you ignore all those and
choose to do it your own way.

If you are using C you should do so rigorously, anyone who finds a
book with C code in it is going to try to use that code when they find
it doesn't work, it's not going to teach them anything, except they
just wasted money on a poor book.

If you choose to use your own ideas and conventions on how C should be
written then the your algorithms are lost in your personal ideas about
how C is or isn't a good language. Instead of looking at your
algorithms and thinking why they work or how they work, you look at
the code and you wonder why the standards and conventions were not
followed and why the code doesn't work.

If you don't follow any standards in C or conventions, then why use C
at all, use a pseudo code, because any advantages from using C go out
the window when you ignore the stanards and conventions that are used.

Christopher Benson-Manica

unread,

Aug 15, 2007, 2:12:51 PM8/15/07

to

[comp.lang.c] Richard Heathfield <r...@see.sig.invalid> wrote:

> Malcolm McLean said:

>> (a bunch of stuff about atheism, all OT)

> (a bunch of stuff about atheism, all OT)

Can we possibly take discussion of Malcolm's anti-atheism book (or
whatever it's about) someplace where it's on topic? Thanks.

>> People see a book as something socially unacceptable.

I'm trying to imagine a context where this statement is reasonable,
without success.

--
C. Benson Manica | I appreciate all corrections, polite or otherwise.
cbmanica(at)gmail.com |
----------------------| I do not currently read any posts posted through
sdf.lonestar.org | Google groups, due to rampant unchecked spam.

Richard Heathfield

unread,

Aug 15, 2007, 3:23:23 PM8/15/07

to

Christopher Benson-Manica said:

> [comp.lang.c] Richard Heathfield <r...@see.sig.invalid> wrote:
>
>> Malcolm McLean said:
>
>>> (a bunch of stuff about atheism, all OT)
>
>> (a bunch of stuff about atheism, all OT)

Er, I didn't actually say anything about atheism. But yes, I know what
you mean.

> Can we possibly take discussion of Malcolm's anti-atheism book (or
> whatever it's about) someplace where it's on topic? Thanks.

It was more of a meta-discussion really - the intent was to encourage
Malcolm to fold my points back onto the discussion of his algorithms
book. But your point is nevertheless well-taken.

>>> People see a book as something socially unacceptable.
>
> I'm trying to imagine a context where this statement is reasonable,
> without success.

Likewise. Especially here in comp.lang.c, which is one of the more
literate groups on Usenet.

Malcolm McLean

unread,

Aug 15, 2007, 5:02:20 PM8/15/07

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message
news:QvudnVF67LK9z17b...@bt.com...

> It was more of a meta-discussion really - the intent was to encourage
> Malcolm to fold my points back onto the discussion of his algorithms
> book. But your point is nevertheless well-taken.
>

I observed that the reaction to Basic Algorithms was almost identical to the
reaction to 12 Common Atheist Arguments (refuted). I wouldn't really want to
pursue the point further, but it is telling.

Richard Heathfield

unread,

Aug 15, 2007, 5:08:08 PM8/15/07

to

Malcolm McLean said:

>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
> news:QvudnVF67LK9z17b...@bt.com...
>> It was more of a meta-discussion really - the intent was to encourage
>> Malcolm to fold my points back onto the discussion of his algorithms
>> book. But your point is nevertheless well-taken.
>>
> I observed that the reaction to Basic Algorithms was almost identical
> to the reaction to 12 Common Atheist Arguments (refuted). I wouldn't
> really want to pursue the point further, but it is telling.

Indeed, but not necessarily for the reason you imagine.

Keith Thompson

unread,

Aug 15, 2007, 5:24:26 PM8/15/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
> news:QvudnVF67LK9z17b...@bt.com...
>> It was more of a meta-discussion really - the intent was to encourage
>> Malcolm to fold my points back onto the discussion of his algorithms
>> book. But your point is nevertheless well-taken.
>
> I observed that the reaction to Basic Algorithms was almost identical
> to the reaction to 12 Common Atheist Arguments (refuted). I wouldn't
> really want to pursue the point further, but it is telling.

Telling? How exactly is it telling?

The reasons for the reaction to your Basic Algorithms book have been
discussed at great length. They have nothing to do with the author.
I, for one, would have been delighted if you had written and published
a *good* Basic Algorithms book, one that presented well-written code
that both presented the algorithms clearly and demonstrated a strong
command of whatever implementation language you chose (it happened to
be C). Minor errors in such a work are nearly inevitable; time
permitting, I would have been glad to review the book and point them
out so they can be corrected.

I can't claim to speak for anyone else, but I'm sure that many other
people here feel the same way.

But when several people read a sample chapter and find numerous
blatant errors (code that doesn't even compile, algorithms that
exhibit undefined behavior, stubborn refusal to use the features of
the language), blaming others for their reaction is absurd.

I might be interested in discussing your atheism book, but I won't do
so here, except to suggest that if two books by the same author
receive similar reactions, you should consider the possibility that
the common factor is the author and his writing style, not some
conspiracy by others to denigrate the author personally.

Malcolm McLean

unread,

Aug 15, 2007, 5:27:00 PM8/15/07

to

"CBFalconer" <cbfal...@yahoo.com> wrote in message

news:46C26F1A...@yahoo.com...

> Eric Sosman wrote:
>> Malcolm McLean wrote:
>>
> ... snip ...
>>>
>>> The objection is to a regular publishing such a book.
>>> I have written a good book.
>>
>> You have done no such thing. I have read good books, and I know.
> ... snip all further elucidation ...
>
> I suspect you need go no further to make an enemy.
>

No, I'm happy to have my works condemned.
Even with Ivor Rockbrain it's more a case of "this post is pure abuse, I'd
better not tolerate it" than anything personal.

Keith's point that "the general consensus on clc is that the book is no
good, therefore it is no good" was false, and naturally I had to reply to
that, by giving the real explanation. But I don't blame people for acting as
they do. Kenny McCormack is essentially right when he sees most of the
bandwidth on clc as dominance games, but where he is wrong is in imagining
that life could be any different.

Life is a game, and the vast majority of people are opponents. not enemies.

Keith Thompson

unread,

Aug 15, 2007, 5:47:36 PM8/15/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:
[...]

> Keith's point that "the general consensus on clc is that the book is
> no good, therefore it is no good" was false, and naturally I had to
> reply to that, by giving the real explanation.

[...]

I don't recall saying that. But if the general consensus on clc is
that the book is no good, you should seriously consider the
possibility that the general consensus may have some truth behind it.

Flash Gordon

unread,

Aug 15, 2007, 6:17:43 PM8/15/07

to

Malcolm McLean wrote, On 15/08/07 22:27:

> "CBFalconer" <cbfal...@yahoo.com> wrote in message
> news:46C26F1A...@yahoo.com...
>> Eric Sosman wrote:
>>> Malcolm McLean wrote:
>>>
>> ... snip ...
>>>>
>>>> The objection is to a regular publishing such a book.
>>>> I have written a good book.
>>>
>>> You have done no such thing. I have read good books, and I know.
>> ... snip all further elucidation ...
>>
>> I suspect you need go no further to make an enemy.
>>
> No, I'm happy to have my works condemned.

You should not be. You should want to improve your works until they are
considered good.

> Even with Ivor Rockbrain it's more a case of "this post is pure abuse,
> I'd better not tolerate it" than anything personal.
>
> Keith's point that "the general consensus on clc is that the book is no
> good, therefore it is no good" was false, and naturally I had to reply
> to that, by giving the real explanation.

That is what you think the reason is. I can't speak for anyone else on
this but it is not my reason for criticising it. My criticism is that
the code is poor as C code and even ignoring the C issues in at least
some cases is poor as examples of algorithms.

You should also note that the C code in "Numerical Recipes in C" has
been criticised here because it is also bad C code. IIRC on at least one
occasions the original version was complemented in the same post because
the Fortran code was good Fortran code. On the other hand, there have
been at least a couple of books by regulars that by regulars past and
present are generally well received. So criticism is not based on
whether someone is a poster here.

If you at least added an errata with the things you accept could have
been better that would help.

Alternatively, you could do all your example in your MiniBasic, then as
it is your language no one could complain that the examples were using
the language badly.

> But I don't blame people for
> acting as they do. Kenny McCormack is essentially right when he sees
> most of the bandwidth on clc as dominance games, but where he is wrong
> is in imagining that life could be any different.
>
> Life is a game, and the vast majority of people are opponents. not enemies.

If you take the attitude that most people are against you then your
treating them as opponents is likely to make them react badly to you.
If, on the other hand, you worked on the assumption that criticism was
of the material and intended to be helpful, rather than of you, then you
would find that peoples reactions to you would be far better.
--
Flash Gordon

Ben Bacarisse

unread,

Aug 15, 2007, 7:52:43 PM8/15/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "CBFalconer" <cbfal...@yahoo.com> wrote in message
> news:46C26F1A...@yahoo.com...
>> Eric Sosman wrote:
>>> Malcolm McLean wrote:
>>>
>> ... snip ...
>>>>
>>>> The objection is to a regular publishing such a book.
>>>> I have written a good book.
>>>
>>> You have done no such thing. I have read good books, and I know.
>> ... snip all further elucidation ...
>>
>> I suspect you need go no further to make an enemy.
>>
> No, I'm happy to have my works condemned.

"Alas, to wear the mantle of Galileo it is not enough that you be
persecuted by an unkind establishment, you must also be right."
Bob Park

This new twist you have thrown in (that because two things you have
written have drawn criticism, there might be something biased about the
criticism) has conveniently side-tracked the thread.

Since you complained about what you see as "trivial" criticisms, I
raised a bigger issue which you have not commented on, so I'll ask
yet again: what use is your (non) queue implementation? How could it
do anything but baffle a student? If you are worried about c.l.c
topicality, why not re-post on comp.programming where the wider issues
can be discussed?

--
Ben.

Kelsey Bjarnason

unread,

Aug 15, 2007, 8:31:19 PM8/15/07

to

[snips]

On Wed, 15 Aug 2007 22:27:00 +0100, Malcolm McLean wrote:

> No, I'm happy to have my works condemned.

Good. I'm creating a post in alt.atheism, titled "Malcom McLean", where
I'm responding to your "12 arguments" book.

Richard Heathfield

unread,

Aug 15, 2007, 8:48:51 PM8/15/07

to

Kelsey Bjarnason said:

Is there still time to change it to "Malcolm McLean"?

Not that I plan on joining the thread. To do so, I'd have to read the
book, and I don't have sufficient motivation to do that, alas. But
getting someone's name right is a matter of courtesy, is it not?

CBFalconer

unread,

Aug 15, 2007, 10:53:05 PM8/15/07

to

Malcolm McLean wrote:
> "CBFalconer" <cbfal...@yahoo.com> wrote in message

>> Eric Sosman wrote:
>>> Malcolm McLean wrote:
>>>
>> ... snip ...
>>>>
>>>> The objection is to a regular publishing such a book.
>>>> I have written a good book.
>>>
>>> You have done no such thing. I have read good books, and I know.
>> ... snip all further elucidation ...
>>
>> I suspect you need go no further to make an enemy.
>
> No, I'm happy to have my works condemned. Even with Ivor Rockbrain
> it's more a case of "this post is pure abuse, I'd better not
> tolerate it" than anything personal.
>
> Keith's point that "the general consensus on clc is that the book
> is no good, therefore it is no good" was false, and naturally I had
> to reply to that, by giving the real explanation. But I don't blame
> people for acting as they do. Kenny McCormack is essentially right
> when he sees most of the bandwidth on clc as dominance games, but
> where he is wrong is in imagining that life could be any different.

McCormack is a troll and a joke. Plonk it. Keith is generally
respected, and accurate, as is Eric Sosman.

Kelsey Bjarnason

unread,

Aug 16, 2007, 12:21:48 PM8/16/07

to

On Thu, 16 Aug 2007 00:48:51 +0000, Richard Heathfield wrote:

> Kelsey Bjarnason said:
>
>> [snips]
>>
>> On Wed, 15 Aug 2007 22:27:00 +0100, Malcolm McLean wrote:
>>
>>> No, I'm happy to have my works condemned.
>>
>> Good. I'm creating a post in alt.atheism, titled "Malcom McLean",
>> where I'm responding to your "12 arguments" book.
>
> Is there still time to change it to "Malcolm McLean"?

My bad. Even worse, I did it there, too. Oh well - apologies to Malcolm
on that one.

> Not that I plan on joining the thread. To do so, I'd have to read the
> book, and I don't have sufficient motivation to do that, alas. But
> getting someone's name right is a matter of courtesy, is it not?

So I can't type - compilers generally complain about mistyped identifiers. :)

Flash Gordon

unread,

Aug 16, 2007, 1:17:36 PM8/16/07

to

Kelsey Bjarnason wrote, On 16/08/07 17:21:

<snip>

> So I can't type - compilers generally complain about mistyped identifiers. :)

Not if you mistype them consistently :-)
--
Flash Gordon
Dyslexic SW Developer.
At least the compiler ensures I spell identifiers consistently wrong.

Kelsey Bjarnason

unread,

Aug 16, 2007, 2:35:26 PM8/16/07

to

On Thu, 16 Aug 2007 18:17:36 +0100, Flash Gordon wrote:

> Kelsey Bjarnason wrote, On 16/08/07 17:21:
>
> <snip>
>
>> So I can't type - compilers generally complain about mistyped identifiers. :)
>
> Not if you mistype them consistently :-)

If I could do that, I'd be happy. As it stands, I live in constant fear
that even the automated spell checkers around me will at some point rise
up against me for cruel and unusual punishment.

Well, not quite, but good goat, my non-formal writing does tend to contain
more than its fair share of typos.

Malcolm McLean

unread,

Aug 16, 2007, 4:18:48 PM8/16/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

news:87wsvw2...@bsb.me.uk...

> "Malcolm McLean" <regn...@btinternet.com> writes:
>
> Since you complained about what you see as "trivial" criticisms, I
> raised a bigger issue which you have not commented on, so I'll ask
> yet again: what use is your (non) queue implementation? How could it
> do anything but baffle a student? If you are worried about c.l.c
> topicality, why not re-post on comp.programming where the wider issues
> can be discussed?
>

I didn't complain about trivial criticisms. If there is typo that means a
closing parenthesis is a bracket, that's worth knowing and pointing out.
What I dispute is that several trivial criticisms of this sort add up to a
major criticism. I suppose there is a point at which they do, but I don't
think we're anywhere near there.

A lot of algorithm books make a big meal of the basic data structures. I
checked the Java queue classes before writing this post, and they were
pretty complicated. I takes two interfaces, the Collection and the Iterable,
and there are nine implementations - linked list queues, delay queues, and
so on. Then you've got add methods, which throw execeptions if the queue is
full, and offer methods which return an error code, so that you can decide
whether the queue filling up is or is not part of normal operation. It's
quite an impressive interface.

However the structures chapter is onyl chapter two. I want to describe the
simplicity of the structure instead of its complexity. I could have provided
something similar to the Java classes, using void pointers and access
functions. But it is unlikely that anyone would use such a function suite.
If you want a trivial queue you have a buffer, a circular buffer if you
don't want the inefficiency of shifting everything up on every remove
operation. Just as you build linked lists by incorporating a "next" pointer
in your structure, and trees built as you go. Whether this says something
good or somethign bad about C is a moot point. The fact is that no generic
basic data structure libraries ahve gained acceptance. If I was writing the
book in Java it would be different.

The whole chapter says "these are are basic data structures you will need,
here are some code fragments which show how to implement them". The reader
should be able to extend the expandable array to produce a dynamically
expanding queue, add functions to check for length and spare capacity, and
so on. An "empty" function is pretty patronising. The priority queue,
implemented efficiently, is the red black tree, but it is appropriate to
hold that back. Actually it is possible to speed up the deletion, I
believe - a web seach on "priority queue" and "efficiency" turns up a slew
of algorithms. The book is Basic Algorithms, covering a wide range of topics
in outline, not a massively detailed description of any one.

Ben Pfaff

unread,

Aug 16, 2007, 5:21:03 PM8/16/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> I could have provided something similar to the Java classes,
> using void pointers and access functions. But it is unlikely
> that anyone would use such a function suite. If you want a
> trivial queue you have a buffer, a circular buffer if you don't
> want the inefficiency of shifting everything up on every remove
> operation.

A circular buffer can be encapsulated in C and still have some
benefits for code clarity, without reducing efficiency. At
least, that's my own evaluation of my own implementation of
circular queues in C:
http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/src/libpspp/deque.h?root=pspp&revision=1.5&content-type=text%2Fplain
http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/src/libpspp/deque.c?root=pspp&revision=1.4&content-type=text%2Fplain
Commentary is welcome of course.

> The priority queue, implemented efficiently, is the red
> black tree, but it is appropriate to hold that back.

You might have a look around for papers on evaluation of priority
queue performance. Red-black trees do pretty well if I recall
correctly, but their performance is not the best in every case.
Heaps are simpler and their performance is competitive for many
purposes. Here is my general-purpose heap implementation in C:
http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/src/libpspp/heap.h?root=pspp&content-type=text%2Fplain
http://cvs.savannah.gnu.org/viewvc/pspp/src/libpspp/heap.c?root=pspp&view=log

(I am not sure that the code I link to above is
comp.lang.c-compliant, but it should not be too far from it.)
--
"The fact that there is a holy war doesn't mean that one of the sides
doesn't suck - usually both do..."
--Alexander Viro

Malcolm McLean

unread,

Aug 16, 2007, 5:58:26 PM8/16/07

to

"Ben Pfaff" <b...@cs.stanford.edu> wrote in message
news:87hcmzn...@blp.benpfaff.org...

> "Malcolm McLean" <regn...@btinternet.com> writes:
>
>> I could have provided something similar to the Java classes,
>> using void pointers and access functions. But it is unlikely
>> that anyone would use such a function suite. If you want a
>> trivial queue you have a buffer, a circular buffer if you don't
>> want the inefficiency of shifting everything up on every remove
>> operation.
>
> A circular buffer can be encapsulated in C and still have some
> benefits for code clarity, without reducing efficiency. At
> least, that's my own evaluation of my own implementation of
> circular queues in C:
>
> http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/src/libpspp/deque.h?root=pspp&revision=1.5&content-type=text%2Fplain
>
> http://cvs.savannah.gnu.org/viewvc/*checkout*/pspp/src/libpspp/deque.c?root=pspp&revision=1.4&content-type=text%2Fplain
> Commentary is welcome of course.
>
>>

/* Initializes DEQUE as an empty deque of elements ELEM_SIZE
bytes in size, with an initial capacity of at least
CAPACITY. Returns the initial deque data array. */
void *
deque_init (struct deque *deque, size_t capacity, size_t elem_size)
{
void *data = NULL;
deque_init_null (deque);
if (capacity > 0)
{
deque->capacity = 1;
while (deque->capacity < capacity)
deque->capacity <<= 1;
data = xnmalloc (deque->capacity, elem_size);
}
return data;
}

You need to check that capacity is less than 1 << (size_t_bits -1) or the
code will go into an infinite loop.
It's not a practical problem, of course, but it is typical of the sorts of
bugs that do creep into code. I was going to launch into an anti-size_t
tirade and the way in which it gives false promises of security, but I'm too
tired and I've got to prepare a mighty defence of another of my books which
is under attack on another ng.

Ben Pfaff

unread,

Aug 16, 2007, 6:30:55 PM8/16/07

to

> while (deque->capacity < capacity)
> deque->capacity <<= 1;

> You need to check that capacity is less than 1 << (size_t_bits -1) or

> the code will go into an infinite loop.

This is a good point. Thank you. However, there's no way to
solve the problem that doesn't violate the postcondition stated
in the function header, so I think I'll "solve" it by adding an
assertion.

--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

CBFalconer

unread,

Aug 16, 2007, 8:08:01 PM8/16/07

to

Malcolm McLean wrote:
>
... snip ...
>

> /* Initializes DEQUE as an empty deque of elements ELEM_SIZE
> bytes in size, with an initial capacity of at least
> CAPACITY. Returns the initial deque data array. */
> void *
> deque_init (struct deque *deque, size_t capacity, size_t elem_size)
> {
> void *data = NULL;
>
> deque_init_null (deque);
> if (capacity > 0) {
> deque->capacity = 1;
> while (deque->capacity < capacity)
> deque->capacity <<= 1;
> data = xnmalloc (deque->capacity, elem_size);
> }
> return data;
> }

Makes no sense, lacking descriptions of struct deque,
deque_init_null, xnmalloc.

James Dow Allen

unread,

Aug 17, 2007, 2:58:24 AM8/17/07

to

On Aug 13, 8:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> Two bucks in one day? Hmmm. I'm not talking to
> you any more without my lawyer.

But I thought you had a huge supply of rounding-error
halfpennies from your days hacking bank computers :-)

> But seriously, there are also advantages in making the table size an
> integer power of 2 (which, in all but one case, fails the primality
> test with a vengeance), since the hash value can then be reduced to the
> table size more rapidly.

Gak! You're not coding ala Falconer are you?
(Doing a division on every reprobe, instead
of a simple add-compare-subtract.) Most hashers
will "want to" divide by a prime for scrambling
anyway, so the index reduction is free.

(BTW, when memory is at a premium one may wish
to expand tables, when required, by less than 100%,
precluding any guarantee of power-of-two.)

Speaking of Falconer, his method (linear probing
with second hash value used for probe increment)
*requires* (even more so than quadratic probing,
see upthread) prime table size.

In my posting I mentioned "interesting" recent
methods like Cuckoo and Compact Hash and got
little response. Is this because the
"Google Groups" kill-files me? (Or, because
people have taken to kill-filing
"James Dow Allen" specifically?)

> I'm not talking to
> you any more without my lawyer.

Well, have a nice day anyway!
James

user923005

unread,

Aug 17, 2007, 3:31:41 AM8/17/07

to

On Aug 16, 11:58 pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> On Aug 13, 8:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> > Two bucks in one day? Hmmm. I'm not talking to
> > you any more without my lawyer.
>
> But I thought you had a huge supply of rounding-error
> halfpennies from your days hacking bank computers :-)
>
> > But seriously, there are also advantages in making the table size an
> > integer power of 2 (which, in all but one case, fails the primality
> > test with a vengeance), since the hash value can then be reduced to the
> > table size more rapidly.
>
> Gak! You're not coding ala Falconer are you?
> (Doing a division on every reprobe, instead
> of a simple add-compare-subtract.) Most hashers
> will "want to" divide by a prime for scrambling
> anyway, so the index reduction is free.

When table size is an integral power of 2, the address can be
calculated from the hash with a single AND operation.

> (BTW, when memory is at a premium one may wish
> to expand tables, when required, by less than 100%,
> precluding any guarantee of power-of-two.)
>
> Speaking of Falconer, his method (linear probing
> with second hash value used for probe increment)
> *requires* (even more so than quadratic probing,
> see upthread) prime table size.
>
> In my posting I mentioned "interesting" recent
> methods like Cuckoo and Compact Hash and got
> little response. Is this because the
> "Google Groups" kill-files me? (Or, because
> people have taken to kill-filing
> "James Dow Allen" specifically?)

I mentioned cuckoo hash also.
This one is easy to understand:
http://www.mpi-inf.mpg.de/~sanders/programs/cuckoo/
It is written in C++ but would be trivial to rewrite in C.

I do not know of any useful implementation of compact hashing that has
a license that would allow me to use it, and I have not read the paper
on compact hashing yet.

user923005

unread,

Aug 17, 2007, 4:01:27 AM8/17/07

to

/* Here is that cuckoo hash function converted to C */
/* Dann Corbit did the quick & dirty C conversion. */
/* */
/* K-ary cuckoo hashing (c) 2002-2003 by Peter Sanders */
/* this is a simple implementation for the purposes */
/* of measuring the performance of cuckoo hashing in abstract */
/* terms (number of probes per insert, storage efficiency) */
/* usage: compile using g++ or another ISO C++ compiler */
/* a.out <K> <n> <r> <repeat> <seed for rng> */
/* there is also a constant tMax that defines the maximum number */
/* of trials for an insertion */
/* allocates space for n elements in K subtables, and repeats */
/* the follwing measurements repeat time: */
/* - find a hash function by filling full lookup tables */
/* with pseudo-random numbers. */
/* - insert elements i=0..n-1 into the table until this fails */
/* for the first time. (The cost of these insertions is not */
/* measured.) */
/* Every n/r successfully inserted elements, the follwing */
/* measurement is repeated n/r times: */
/* * a random element i2 is deleted */
/* * the hash table entries for i2 are filled with new random values
*/
/* * i2 is reinserted. */
/* Note that this is equivalent to removing a random element */
/* inserting a new element that */
/* has never been in the table before. */
/* The output is a table that gives */
/* - x the number of elements in the table at each measuremnt interval
*/
/* - the average number of probes for an insertion during the
measruements */
/* for the given number of inserted elements */
/* - K */
/* - n */
/* - repeat */
/* - seed */
#define DEBUGLEVEL 1
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <time.h>
#include "util.h"
#include "mt-real.c"

#define split 1

const int tMax = 10000; /* max number of random walk trials */
int K; /* number of probes */
int n; /* number of elements */
int m; /* number of elements per table */
int **hash; /* K times n array */
int **table; /* K times m array */

double limit(double x, double bound)
{
if (x > bound) {
return bound;
} else if (x < -bound) {
return -bound;
} else
return x;
}

double cpuTime()
{
return clock() * 1e-6;
}

/* generate random int in 0..x-1 */
int rand0K(int x)
{
return (int) (genrand() * x);
}

/* insert element i into table */
/* return value: */
/* -1 failure */
/* otherwise number of hash function evaluation */
int insert(int i)
{
int forbidden = -1;
int j = rand0K(K);
int t;
for (t = 1; t <= tMax; t++) {
int p = hash[j][i];
int newI = table[j][p];
table[j][p] = i; /* play the cuckoo */
if (newI == -1)
return t; /* done */
forbidden = j;
i = newI; /* find new home for cuckoo victim */
j = rand0K(K - 1);
if (j == forbidden)
j = K - 1;
}
return tMax + 1; /* insertion failed */
}

/* delete element i from the table */
void delete(int i)
{
int j;
for (j = 0; j < K; j++) {
int p = hash[j][i];
if (table[j][p] == i) {
table[j][p] = -1;
return;
}
}
}

int main(int argc, char **argv)
{
int i,
j;
int r;
int step;
int repeat;

int seed;
double *sumT;
int *cf;
int rep;

assert(argc == 6);
K = atoi(argv[1]); /* number of probes */
n = atoi(argv[2]); /* number of elements */
r = atoi(argv[3]); /* number of measured densities */
repeat = atoi(argv[4]); /* how often to start from scratch */
seed = atoi(argv[5]);
m = (int) (n / K + 0.5);
step = n / r;
sgenrand(seed);
puts("# x tAvg(x) K N repeat seed");

/* allocate hash function and table */
/* and an empty table */
hash = calloc(K, sizeof(int *));
table = calloc(K, sizeof(int *));
for (j = 0; j < K; j++) {
hash[j] = calloc(n, sizeof(int));
table[j] = calloc(m, sizeof(int));
}

/* initialize statistics */
/* sumT[i] is the average time for size i*step */
sumT = calloc(r + 1, sizeof(double));
cf = calloc(r + 1, sizeof(int));
for (i = 0; i < r; i++) {
sumT[i] = 0;
cf[i] = 0;
}

/* main loop */
for (rep = 0; rep < repeat; rep++) {
/* init hash function and empty table */
for (j = 0; j < K; j++) {
for (i = 0; i < n; i++) {
hash[j][i] = rand0K(m);
}
for (i = 0; i < m; i++) {
table[j][i] = -1;
}
}

/* fill table and keep measuring from time to time */
for (i = 0; i < n; i++) {
if (insert(i) > tMax)
break; /* table is full */
/* measure in detail here */
if (((i + 1) % step) == 0) {
int i1;
for (i1 = 0; i1 < step; i1++) {
int t;
/* delete & reinsert a random element */
int i2 = rand0K(i);
delete(i2);
for (j = 0; j < K; j++)
hash[j][i2] = rand0K(m);
t = insert(i2);
cf[i / step] += (t > tMax);
/* cout << t << endl; */
sumT[i / step] += t;
}
}
}
}

for (rep = 0; rep < r; rep++) {
printf("%d %f %d %d %d %d %d\n",
rep * step + step,
sumT[rep] / step / repeat,
K,
n,
repeat,
seed,
cf[rep]);
}
return 0;
}

/*
mt-real.c follows:
*/
/* A C-program for MT19937B: Real number version */
/* genrand() generate one pseudorandom number with double precision
*/
/* which is uniformly distributed on [0,1]-interval for each call.
*/
/* sgenrand(seed) set initial values to the working area of 624
words.*/
/* sgenrand(seed) must be called once before calling genrand()
*/
/* (seed is any integer except 0).
*/

/*
LICENCE CONDITIONS:

Matsumoto and Nishimura consent to GNU General
Public Licence

NOTE:
When you use it in your program, please let Matsumoto
<matu...@math.keio.ac.jp> know it.

Because of a machine-trouble, Matsumoto lost emails
which arrived during May 28-29.
*/

#include<stdio.h>

/* Period parameters */
#define N 624
#define M 397
#define MATRIX_A 0x9908b0df /* constant vector a */
#define UPPER_MASK 0x80000000 /* most significant w-r bits */
#define LOWER_MASK 0x7fffffff /* least significant r bits */

/* for tempering */
#define TEMPERING_MASK_B 0x9d2c5680
#define TEMPERING_MASK_C 0xefc60000
#define TEMPERING_SHIFT_U(y) (y >> 11)
#define TEMPERING_SHIFT_S(y) (y << 7)
#define TEMPERING_SHIFT_T(y) (y << 15)
#define TEMPERING_SHIFT_L(y) (y >> 18)

static unsigned long ptgfsr[N]; /* set initial seeds: N = 624 words */

void
sgenrand(unsigned long seed) /* seed should not be 0 */
{
int k;

/* setting initial seeds to ptgfsr[N] using */
/* the generator Line 25 of Table 1 in */
/* [KNUTH 1981, The Art of Computer Programming */
/* Vol. 2 (2nd Ed.), pp102] */

ptgfsr[0]= seed & 0xffffffff;
for (k=1; k<N; k++)
ptgfsr[k] = (69069 * ptgfsr[k-1]) & 0xffffffff;
}

double
genrand()
{
unsigned long y;
static int k = 1;
static unsigned long mag01[2]={0x0, MATRIX_A};
/* mag01[x] = x * MATRIX_A for x=0,1 */

if(k == N){ /* generate N words at one time */
int kk;
for (kk=0;kk<N-M;kk++) {
y = (ptgfsr[kk]&UPPER_MASK)|(ptgfsr[kk+1]&LOWER_MASK);
ptgfsr[kk] = ptgfsr[kk+M] ^ (y >> 1) ^ mag01[y & 0x1];
}
for (;kk<N-1;kk++) {
y = (ptgfsr[kk]&UPPER_MASK)|(ptgfsr[kk+1]&LOWER_MASK);
ptgfsr[kk] = ptgfsr[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1];
}
y = (ptgfsr[N-1]&UPPER_MASK)|(ptgfsr[0]&LOWER_MASK);
ptgfsr[N-1] = ptgfsr[M-1] ^ (y >> 1) ^ mag01[y & 0x1];

k = 0;
}

y = ptgfsr[k++];
y ^= TEMPERING_SHIFT_U(y);
y ^= TEMPERING_SHIFT_S(y) & TEMPERING_MASK_B;
y ^= TEMPERING_SHIFT_T(y) & TEMPERING_MASK_C;
y &= 0xffffffff; /* you may delete this line if word size = 32 */
y ^= TEMPERING_SHIFT_L(y);

return ( (double)y / (unsigned long)0xffffffff );
}

/*
util.h follows:
*/
/*
// this files contains all the application independent little
// functions and macros used for the optimizer.
// In particular Peters debug macros and Dags stuff
// from dbasic.h cdefs, random,...

//////////////// stuff originally from
debug.h ///////////////////////////////
// (c) 1997 Peter Sanders
// some little utilities for debugging adapted
// to the paros conventions
*/

#ifndef UTIL
#define UTIL

#ifndef Max
#define Max(x,y) ((x)>=(y)?(x):(y))
#endif

#ifndef Min
#define Min(x,y) ((x)<=(y)?(x):(y))
#endif

#ifndef Abs
#define Abs(x) ((x) < 0 ? -(x) : (x))
#endif

#ifndef PI
#define PI 3.1415926535
#endif

#endif

Richard Heathfield

unread,

Aug 17, 2007, 6:25:57 AM8/17/07

to

James Dow Allen said:

> On Aug 13, 8:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>
>> But seriously, there are also advantages in making the table size an
>> integer power of 2 (which, in all but one case, fails the primality
>> test with a vengeance), since the hash value can then be reduced to
>> the table size more rapidly.
>
> Gak! You're not coding ala Falconer are you?

No, I don't think so. I don't actually use hash tables a great deal
(which is not to say that I never use them), but when I *do* use them,
I don't probe at all. I just use stretchy buckets instead (using trees
to handle the stretchy bit).

<snip>

Eric Sosman

unread,

Aug 17, 2007, 8:43:06 AM8/17/07

to

James Dow Allen wrote:
> [...]

> Speaking of Falconer, his method (linear probing
> with second hash value used for probe increment)
> *requires* (even more so than quadratic probing,
> see upthread) prime table size.

Well, not exactly: It requires only that the increment
between probes be relatively prime to the table size. A prime
table size makes this criterion easy to satisfy, but it's not
the only way. A table size equal to a power of two, for example,
works just fine if the probe increment is odd. (Degenerate
special case: pure linear probing with an increment of unity
works with any table size at all, prime or composite.)

Some people seem very interested in the relative speed
of AND vs MOD for this purpose, and on most machines AND
is faster or at the very least no slower. I think, though,
that the time savings must be viewed as just one component
of the search cost -- another piece (sometimes rather large)
is computing the key's hash value in the first place, not
to mention the cost of equality comparisons on the table
items encountered along the way.

(Slight digression): It's often suggested that those
equality comparisons can be made cheaper by storing hash
codes in the table entries along with the payload. Before
making a possibly expensive strcmp() or whatever, one can
first do a (fast) arithmetic comparison of the hashes: if
the hashes don't match, the objects mush be unequal and
it is unnecessary to make the expensive comparison. But this
stratagem doesn't seem to me to be a "Well, doh!" proposition;
at least two counter-arguments exist:

- If the table stores pointers to items rather than item
instances, storing hash values in the table roughly
doubles the amount of memory required. If the same
amount of memory were used for a pointers-only table,
the number of table slots would double and the load
factor would be halved. This should lead to shorter
searches -- and it is not at all clear _a priori_ that
more fast probes will outrace fewer slow probes.

- Storing the hash codes saves time whenever a probe hits
a slot occupied by the "wrong" key (it saves no time when
the probe finds the "right" key). If the time savings is
significant, it must mean that the key comparisons are
very expensive or that the probe sequences are very long.
If we've taken care to construct good hash functions and
have reason to believe that searches take two-point-mumble
probes, on the average, the savings is only one-point-mumble
comparisons per search.

As always when one is interested in speed, the gold standard
is the familiar "Measure, measure, measure!"

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Aug 17, 2007, 11:30:46 AM8/17/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87wsvw2...@bsb.me.uk...
>> "Malcolm McLean" <regn...@btinternet.com> writes:
>>
>> Since you complained about what you see as "trivial" criticisms, I
>> raised a bigger issue which you have not commented on, so I'll ask
>> yet again: what use is your (non) queue implementation? How could it
>> do anything but baffle a student? If you are worried about c.l.c
>> topicality, why not re-post on comp.programming where the wider issues
>> can be discussed?

<snip>

> A lot of algorithm books make a big meal of the basic data
> structures. I checked the Java queue classes before writing this post,
> and they were pretty complicated. I takes two interfaces, the
> Collection and the Iterable, and there are nine implementations -
> linked list queues, delay queues, and so on. Then you've got add
> methods, which throw execeptions if the queue is full, and offer
> methods which return an error code, so that you can decide whether the
> queue filling up is or is not part of normal operation. It's quite an
> impressive interface.

I am not asking you to over-complicate it so I don't see how anyone
else's complex interface has anything to do with it.

> However the structures chapter is onyl chapter two. I want to describe
> the simplicity of the structure instead of its complexity. I could
> have provided something similar to the Java classes, using void
> pointers and access functions. But it is unlikely that anyone would
> use such a function suite. If you want a trivial queue you have a
> buffer, a circular buffer if you don't want the inefficiency of
> shifting everything up on every remove operation. Just as you build
> linked lists by incorporating a "next" pointer in your structure, and
> trees built as you go.

Don't go there again. There is nothing wrong with using a fixed
circular buffer to implement a bounded queue. I never said there was
(and I have said it was fine before).

<snip>

> The whole chapter says "these are are basic data structures you will
> need, here are some code fragments which show how to implement
> them". The reader should be able to extend the expandable array to
> produce a dynamically expanding queue, add functions to check for
> length and spare capacity, and so on.

I think we will have to leave this argument. I am beginning to see
that we have different opinions about what makes a data structure and
what makes a good program. It seems you think it is fine that your
readers will have to re-invent one of the various methods to
distinguish between a full and an empty queue, and pepper their code
with references to the head and tail counters (like you do in your
program that uses a stack with no 'push', 'pop', 'empty' or 'full'
functions). They will also have to re-write the bits that you *did*
write, since you don't test for full or emptiness, in order to do
anything with it at all.

If you don't believe me, please just answer my original question and
show how to wrote a simple algorithm (any algorithm) using your queue
operations. If you do, you will see how you have move all sorts of
implementation details into the program the uses the queue. Of
course, you could also do it my ditching what you wrote in the book
and implementing a bounded queue properly. It is only a few lines
more code. There would be no need to make it complicated, just usable
and self-contained.

> An "empty" function is pretty
> patronising. The priority queue, implemented efficiently, is the red
> black tree, but it is appropriate to hold that back. Actually it is
> possible to speed up the deletion, I believe - a web seach on
> "priority queue" and "efficiency" turns up a slew of algorithms. The
> book is Basic Algorithms, covering a wide range of topics in outline,
> not a massively detailed description of any one.

You miss the point. I don't want it to be "massively detailed", but
clear, usable and self-contained would be nice.

--
Ben.

Richard Harter

unread,

Aug 17, 2007, 11:32:27 AM8/17/07

to

<OT>

On Fri, 17 Aug 2007 08:43:06 -0400, Eric Sosman
<eso...@ieee-dot-org.invalid> wrote:

>James Dow Allen wrote:
>> [...]
>> Speaking of Falconer, his method (linear probing
>> with second hash value used for probe increment)
>> *requires* (even more so than quadratic probing,
>> see upthread) prime table size.
>
> Well, not exactly: It requires only that the increment
>between probes be relatively prime to the table size. A prime
>table size makes this criterion easy to satisfy, but it's not
>the only way. A table size equal to a power of two, for example,
>works just fine if the probe increment is odd. (Degenerate
>special case: pure linear probing with an increment of unity
>works with any table size at all, prime or composite.)

In a thread in comp.programming (IIRCC) Paul Hsieh pointed out
that always using the same probe increment in a power of two
table size also has bad properties. His solution is to use the
unused bits of the full hash to select an increment size from a
table of primes.

[snip]

re: counterargument to storing the hash code.

> - If the table stores pointers to items rather than item
> instances, storing hash values in the table roughly
> doubles the amount of memory required. If the same
> amount of memory were used for a pointers-only table,
> the number of table slots would double and the load
> factor would be halved. This should lead to shorter
> searches -- and it is not at all clear _a priori_ that
> more fast probes will outrace fewer slow probes.

This doesn't sound right - ordinarily a table has to hold a
key/value pair. Adding a hash would increase the load factor by
50% - for a given amount of memory.

>
> - Storing the hash codes saves time whenever a probe hits
> a slot occupied by the "wrong" key (it saves no time when
> the probe finds the "right" key). If the time savings is
> significant, it must mean that the key comparisons are
> very expensive or that the probe sequences are very long.
> If we've taken care to construct good hash functions and
> have reason to believe that searches take two-point-mumble
> probes, on the average, the savings is only one-point-mumble
> comparisons per search.
>
> As always when one is interested in speed, the gold standard
>is the familiar "Measure, measure, measure!"

</OT>

Richard Harter

unread,

Aug 17, 2007, 11:35:23 AM8/17/07

to

On Fri, 17 Aug 2007 01:01:27 -0700, user923005
<dco...@connx.com> wrote:

>/* Here is that cuckoo hash function converted to C */
>/* Dann Corbit did the quick & dirty C conversion. */

Thanks for posting the code - much appreciated.

CBFalconer

unread,

Aug 17, 2007, 1:32:51 PM8/17/07

to

Richard Harter wrote:
> Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
>> James Dow Allen wrote:
>>
>>> [...]
>>> Speaking of Falconer, his method (linear probing with second
>>> hash value used for probe increment) *requires* (even more so
>>> than quadratic probing, see upthread) prime table size.
>>
>> Well, not exactly: It requires only that the increment
>> between probes be relatively prime to the table size. A prime
>> table size makes this criterion easy to satisfy, but it's not
>> the only way. A table size equal to a power of two, for example,
>> works just fine if the probe increment is odd. (Degenerate
>> special case: pure linear probing with an increment of unity
>> works with any table size at all, prime or composite.)
>
> In a thread in comp.programming (IIRCC) Paul Hsieh pointed out
> that always using the same probe increment in a power of two
> table size also has bad properties. His solution is to use the
> unused bits of the full hash to select an increment size from a
> table of primes.

That is dangerous. The purpose of the multiple hash is to have
different search patterns for different objects that have the same
primary hash pattern. This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is the
reduction of the range of the secondary hash, which encourages
light clustering in nearly empty tables. The system attempts to
maintain tables at between 44 and 88% of capacity.

Also, I want the table action to be independent of the hash
functions (apart from speed), so the functions are computed every
time needed. The system functions with all the hash functions
replaced by the constant 1 (unity) (tested in the test suite) at a
considerable speed penalty. So the action is hash function
independent, but the speed is highly dependent.

The system is intended to isolate all these decisions in the hash
table code, so the user need not worry about them. Similarly the
algorithms for table expansion, etc. The result is a storage and
retrieval black box.

Malcolm McLean

unread,

Aug 17, 2007, 5:00:50 PM8/17/07

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message

>> A lot of algorithm books make a big meal of the basic data
>> structures. I checked the Java queue classes before writing this post,

> I am not asking you to over-complicate it so I don't see how anyone
> else's complex interface has anything to do with it.
>

The Java class is not written by idiots. It is basically what you need for a
production generic queue class. However my code is not production code. It
is there to illustrate how to implement the algorithm.

> I think we will have to leave this argument. I am beginning to see
> that we have different opinions about what makes a data structure and
> what makes a good program. It seems you think it is fine that your
> readers will have to re-invent one of the various methods to
> distinguish between a full and an empty queue, and pepper their code
> with references to the head and tail counters (like you do in your
> program that uses a stack with no 'push', 'pop', 'empty' or 'full'
> functions). They will also have to re-write the bits that you *did*
> write, since you don't test for full or emptiness, in order to do
> anything with it at all.
>

The alternative to to write generic structures. Whilst that would be a
workable approach, in fact C doesn't lend itself to this. Many such
libraries have been written, none has gained acceptance of been included in
the standard library. Readers are better off understanding that these
relatively simple structures are fundamental, and will be implemented in a
wide variety of functions.

>
> If you don't believe me, please just answer my original question and
> show how to wrote a simple algorithm (any algorithm) using your queue
> operations.
>

That's not the intention. The queue takes only integers and has a hardwired
size. The code shows someone who doesn't know that queues are importnat data
structures how to implement one. It doesn't give nor is in intended to give
a generic "queue" class.

>
> If you do, you will see how you have move all sorts of
> implementation details into the program the uses the queue. Of
> course, you could also do it my ditching what you wrote in the book
> and implementing a bounded queue properly. It is only a few lines
> more code. There would be no need to make it complicated, just usable
> and self-contained.
>

Exactly. You go to the book, then you implement your own queue, as needed,
using the code in the book as a reference if necessary.

>
> You miss the point. I don't want it to be "massively detailed", but
> clear, usable and self-contained would be nice.
>

There are many generic queue libraries. None has gained wide acceptance.
Therefore the bar for "usable" is in fact very high. This is not true of
other languages, but it is true of C.

Malcolm McLean

unread,

Aug 17, 2007, 4:50:41 PM8/17/07

to

"CBFalconer" <cbfal...@yahoo.com> wrote in message

news:46C4E6E1...@yahoo.com...

> Malcolm McLean wrote:
>>
> ... snip ...
>>
>> /* Initializes DEQUE as an empty deque of elements ELEM_SIZE
>> bytes in size, with an initial capacity of at least
>> CAPACITY. Returns the initial deque data array. */
>> void *
>> deque_init (struct deque *deque, size_t capacity, size_t elem_size)
>> {
>> void *data = NULL;
>>
>> deque_init_null (deque);
>> if (capacity > 0) {
>> deque->capacity = 1;
>> while (deque->capacity < capacity)
>> deque->capacity <<= 1;
>> data = xnmalloc (deque->capacity, elem_size);
>> }
>> return data;
>> }
>
> Makes no sense, lacking descriptions of struct deque,
> deque_init_null, xnmalloc.
>

For it to be OK dequeue->capacity would have to be type guaranteed to be
wider than a size_t. There is no such type in standard C. So you can see
that the code is incorrect.
Actually calling with ~0 is a mistake a calling programmer is not too
unlikely to make. If the he knows he needs N-1 spaces in his queue for a
dental surgery, one patient on the chair and thus not in queue, then he
might pass in N -1 and forget for check for N == 0.

user923005

unread,

Aug 17, 2007, 5:28:16 PM8/17/07

to

On Aug 17, 2:00 pm, "Malcolm McLean" <regniz...@btinternet.com> wrote:
[snip]

> There are many generic queue libraries. None has gained wide acceptance.
> Therefore the bar for "usable" is in fact very high. This is not true of
> other languages, but it is true of C.

These queues have a large following:
http://www.microsoft.com/windowsserver2003/technologies/msmq/default.mspx
http://www-306.ibm.com/software/integration/wmq/

Here is a list of queues:
http://sourceforge.net/search/?type_of_search=soft&words=queue

I like this one:
http://sourceforge.net/projects/safmq/
It's C++ though, and is a lot more that most people probably need for
a simple fifo.

For those with simple needs, here is a thread-safe fifo:
http://fifoembed.sourceforge.net/
It's POSIX oriented, but it should be easy enough to convert to
generic use (POSIX threads for Windows could be used for that OS, for
instance).

Malcolm McLean

unread,

Aug 17, 2007, 5:38:59 PM8/17/07

to

"Richard Harter" <c...@tiac.net> wrote in message
news:46c5b702...@news.sbtc.net...

> This doesn't sound right - ordinarily a table has to hold a
> key/value pair. Adding a hash would increase the load factor by
> 50% - for a given amount of memory.
>

Normally you can derive the keys from the data item. So in C you would pass
in a function pointer to extract a key from a datum, or if you want to be a
little faster but less flexible, pass in a length and offset.
Unfortunately it make the table harder to use.

Ben Bacarisse

unread,

Aug 17, 2007, 7:47:53 PM8/17/07

to

"Malcolm McLean" <regn...@btinternet.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>>> A lot of algorithm books make a big meal of the basic data
>>> structures. I checked the Java queue classes before writing this post,
>> I am not asking you to over-complicate it so I don't see how anyone
>> else's complex interface has anything to do with it.
>>
> The Java class is not written by idiots. It is basically what you need
> for a production generic queue class. However my code is not
> production code. It is there to illustrate how to implement the
> algorithm.

But it is not an excuse for your useless example.

>> I think we will have to leave this argument. I am beginning to see
>> that we have different opinions about what makes a data structure and
>> what makes a good program. It seems you think it is fine that your
>> readers will have to re-invent one of the various methods to
>> distinguish between a full and an empty queue, and pepper their code
>> with references to the head and tail counters (like you do in your
>> program that uses a stack with no 'push', 'pop', 'empty' or 'full'
>> functions). They will also have to re-write the bits that you *did*
>> write, since you don't test for full or emptiness, in order to do
>> anything with it at all.
>>
> The alternative to to write generic structures.

No. The alternative is to write a functional, useful, simple,
fixed-length queue of integers (since those are the very sensible
restrictions you put on your example). I am not asking you to make it
generic (in types), or flexible (in size), just usable as your text
claims it to be.

Write a short example that uses your add and remove functions do
anything at all. You'll have to put half of the code for add and
remove (the bits that checks if adding or removing is possible) into
the algorithm. You can't be suggesting that is how one should use a
simple queue -- by putting the slightly tricky full and empty tests
outside of any queue operations? Of course, it is quite reasonable to
leave them out of the add and remove functions for efficiency,
provided there is a user note: "don't call add() unless is_full()
returns false" (and the mirror for the awkwardly named remove()). In
an introductory text, I'd have add() and remove() do these tests, but
if you don't do that they need to be provided as queue operations.

If you'd said, "now you just need to define is_full() and is_empty()
functions and you can start using this queue" that would be a fine
exercise (but I'd include an answer later since it is slightly tricky)
but you seem to be suggesting that it can be used as you have left it.

Maybe this is at the heart of out differences. Maybe you are happy
that a significant implementation issue (how to distinguish between a
full and an empty circular buffer) be peppered about the algorithms
that use the buffer. If so, it partially explains why you left the
queue unfinished, but I don't think it is a good example to set, and it
is a terrible way to write maintainable programs. You do it in the
stack example, so maybe that is what you expect.

If this is not the reason, then I am just at a loss. A book called
Basic Algorithms includes code for a queue. This code can't be used
in any basic algorithm I know without first finishing the design by
deciding how to distinguish between full and empty queues and then either

(a) putting this code, which tests variables that belong to the queue
all over the algorithms that use the queue;
(b) rewriting add and remove so they can signal failure and have them do
these tests; or
(c) finish the interface by adding these two testing operations.

In all these cases I think you need to tell your readers that the queue
is not finished.

>> If you don't believe me, please just answer my original question and
>> show how to wrote a simple algorithm (any algorithm) using your queue
>> operations.
>>
> That's not the intention. The queue takes only integers and has a
> hardwired size. The code shows someone who doesn't know that queues
> are importnat data structures how to implement one. It doesn't give
> nor is in intended to give a generic "queue" class.

I don't care that it not generic. Did I not say that enough yet? I
am curious that the intention was not that it be used. Given that
assumption, it is easier to write good books. Would it not have been
better to provide a simple but usable (allbeit limited) example? It
is only a few more lines of code. And in case you mistake my
meaning -- no need to make it generic or dynamic.

>> If you do, you will see how you have move all sorts of
>> implementation details into the program the uses the queue. Of
>> course, you could also do it my ditching what you wrote in the book
>> and implementing a bounded queue properly. It is only a few lines
>> more code. There would be no need to make it complicated, just usable
>> and self-contained.
>>
> Exactly. You go to the book, then you implement your own queue, as
> needed, using the code in the book as a reference if necessary.

Had you finished the implementation it would have been a more useful
reference. Is the "bug" in the code just the fact the you missed out
the sentence: "You need to make one further design decision and then
finish the implementation by writing is_full() and is_empty()
functions."?

>> You miss the point. I don't want it to be "massively detailed", but
>> clear, usable and self-contained would be nice.
>>
> There are many generic queue libraries. None has gained wide
> acceptance. Therefore the bar for "usable" is in fact very high. This
> is not true of other languages, but it is true of C.

Set the bar as low as you like. Yours is not usable without either
finishing it, rewriting it, or messing up the code that tries to use
it. I am talking about two functions of a line or two each. I can't
see why you think your "sketch" is more helpful without these or why
adding them is too complex for you to consider.

--
Ben.

Eric Sosman

unread,

Aug 17, 2007, 9:33:47 PM8/17/07

to

Richard Harter wrote:
> <OT>
> On Fri, 17 Aug 2007 08:43:06 -0400, Eric Sosman
> <eso...@ieee-dot-org.invalid> wrote:
>
>> James Dow Allen wrote:
>>> [...]
>>> Speaking of Falconer, his method (linear probing
>>> with second hash value used for probe increment)
>>> *requires* (even more so than quadratic probing,
>>> see upthread) prime table size.
>> Well, not exactly: It requires only that the increment
>> between probes be relatively prime to the table size. A prime
>> table size makes this criterion easy to satisfy, but it's not
>> the only way. A table size equal to a power of two, for example,
>> works just fine if the probe increment is odd. (Degenerate
>> special case: pure linear probing with an increment of unity
>> works with any table size at all, prime or composite.)
>
> In a thread in comp.programming (IIRCC) Paul Hsieh pointed out
> that always using the same probe increment in a power of two
> table size also has bad properties. His solution is to use the
> unused bits of the full hash to select an increment size from a
> table of primes.

I did not and do not imply that the probe increment must
be the same for all keys, nor even for all keys that probe
the same slot on the first attempt. This is one of the reasons
twin primes are popular as table sizes: you use the hash code
mod N for the first probe, and then if necessary you use the
hash mod (N-2) plus 1 as the increment for further probes. The
primality of N guarantees that the increment is relatively prime
to it; the primality of N-2 helps scramble groups of keys that
have simple relationships (sum1, sum2, sum3, ...).

> [snip]
>
> re: counterargument to storing the hash code.
>
>> - If the table stores pointers to items rather than item
>> instances, storing hash values in the table roughly
>> doubles the amount of memory required. If the same
>> amount of memory were used for a pointers-only table,
>> the number of table slots would double and the load
>> factor would be halved. This should lead to shorter
>> searches -- and it is not at all clear _a priori_ that
>> more fast probes will outrace fewer slow probes.
>
> This doesn't sound right - ordinarily a table has to hold a
> key/value pair. Adding a hash would increase the load factor by
> 50% - for a given amount of memory.

I think you said the same thing I did; perhaps we have a
confusion of terminology. (Not surprising, since "hash table"
itself is not "a" thing, but a wide and varied family of things.)

My point is that the choice between a pointer/hash pair and
a pointer alone is not clear-cut. Storing the pair allows one
to short-circuit the (possibly expensive) comparisons when the
search key's hash and the stored hash disagree. For the same
expenditure of memory, a pointer-only table could have twice as
many slots and a corresponding reduction in collision frequencies,
leading to fewer probes per search.

Hmmm: Maybe our disagreement is about where the key lives.
I've been assuming that the key is part of the item, so the
choice is between pointer/hash and pointer. But maybe you're
assuming the key is a separate entity, so the choice is between
key_pointer/data_pointer and key_pointer/data_pointer/hash. The
argument still holds with a different expansion factor: Storing
the hash now adds 50% to the table size, while leaving it out
allows 33% more slots and a load factor reduced by one-third.

If the table stores complete "records" rather than pointers
and if the records are significantly larger than hash values,
then storing the hashes exacts a proportionally smaller penalty
(flip side: declining to store them gains a proportionally smaller
advantage).

Richard Heathfield

unread,

Aug 18, 2007, 2:21:03 AM8/18/07

to

Ben Bacarisse said:

<snip>

> I'd have add() and remove() do these tests, but
> if you don't do that they need to be provided as queue operations.

I pointed out the name clash with <stdio.h>'s remove() some days - or
possibly even weeks - ago. That should have been plenty of time to
correct the electronic version of the book (given that MM seems to have
plenty of time to discuss it, so I presume he has enough time to fix it
too - the more important task).

So I went to have a look, to see if it had been fixed yet. (This is the
first time I've looked at the chapter.)

MM's A123 XYZ / B123 CDE example is wrong, in that it suggests checking
Slot 134. If that's empty, MM says, B123 CDE goes goes in Slot 124. I
can see no justification whatsoever for checking slot (S + 10) before
deciding whether to use Slot S.

I was puzzled to find that hash table entries were not abstracted as the
table itself was. "Divide and conquer" is too useful a strategy to be
ignored lightly, especially in teaching.

Oddly, MM seems to be using int rather than size_t for recording lengths
and sizes (neither of which can be negative). Presumably this is a
typo, since no experienced C programmer would deliberately be that
daft.

His malloc is not ideal - answer = malloc(sizeof *answer); would be
preferable. He repeats this mistake later in the (poorly named)
hashtable() function.

Good functions perform a task. They are doers. And they need "doing
words" (which we sometimes call "verbs") in their names. For example,
stdio's "remove" and "rename" and "fopen" and "printf" - all reasonably
good names (given their creator's habit of terseness).

The indentation is appalling, and led me to believe at first that
hashtable() always returns 0. I had to double back to check. Blech.

There's a reference to a strdup call, but as far as I can see there is
neither a definition of a strdup function nor indeed a call to one.
There is a call to a mystrdup function, but that isn't the same thing.
(Nor is it defined, so we have no idea what it does.)

I wonder what "menas" means.

MM seems to be thinking that "quadratic probing" means "square the hash
value" (and that's a direct quote). Either he's misunderstood quadratic
probing completely, or I have. As far as I'm concerned, quadratic
probing has nothing whatsoever to do with the hash value except as a
place to start looking for a slot. My understanding is that, if the
hash value h has so far resulted in n collisions in a table of size s,
then the place to look next is slot (h + n*n) % s - and most certainly
NOT (h * h) % s!

At this point, I started to lose interest, since it seems evident that
the chapter is so badly broken as to be beyond recovery - and I started
to skim. At random, I discovered an invasion of implementation
namespace, a correct claim that the primality test was interesting (but
alas, it is interesting for all the wrong reasons), an incorrect claim
that trial division is the only guaranteed way to determine whether a
number is prime, and a prime number function that wrongly decides 2 is
not a prime number.

Overall, I found the chapter to be of disappointingly low quality, and I
could not possibly recommend it. If that chapter is the advertisement,
the product itself must be dire indeed.

Malcolm McLean

unread,

Aug 18, 2007, 4:15:22 AM8/18/07

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message
news:34-dnT3gY7T...@bt.com...

> Ben Bacarisse said:
>
> <snip>
>
>> I'd have add() and remove() do these tests, but
>> if you don't do that they need to be provided as queue operations.
>
> I pointed out the name clash with <stdio.h>'s remove() some days - or
> possibly even weeks - ago. That should have been plenty of time to
> correct the electronic version of the book (given that MM seems to have
> plenty of time to discuss it, so I presume he has enough time to fix it
> too - the more important task).
>

Corrrections are being stored up for an edition 2, which will be released
shortly.

> MM's A123 XYZ / B123 CDE example is wrong, in that it suggests
> checking Slot 134. If that's empty, MM says, B123 CDE goes goes in Slot
> 124. I can see no justification whatsoever for checking slot (S + 10)
> before
> deciding whether to use Slot S.
>
> I was puzzled to find that hash table entries were not abstracted as the
> table itself was. "Divide and conquer" is too useful a strategy to be
> ignored lightly, especially in teaching.
>

There are two tables. One stores data items of any size, one stores
pointers.

>
> Oddly, MM seems to be using int rather than size_t for recording lengths
> and sizes (neither of which can be negative). Presumably this is a
> typo, since no experienced C programmer would deliberately be that
> daft.
>

It's an algorithms book. The algorithms work on integers. The hash function
then has to to return a size_t, which is confusing to people using the book
to understand how to port to another language.

>
> His malloc is not ideal - answer = malloc(sizeof *answer); would be
> preferable. He repeats this mistake later in the (poorly named)
> hashtable() function.
>

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you will seldom
see in another environment. The reader has to perform a mental dereference
which makes it harder to see if the size of correct.

>
> Good functions perform a task. They are doers. And they need "doing
> words" (which we sometimes call "verbs") in their names. For example,
> stdio's "remove" and "rename" and "fopen" and "printf" - all reasonably
> good names (given their creator's habit of terseness).
>

My convention is to have an opaque structure, and a constructor with the
name of the structure in lower case. This is analagous to C++. As you say,
create_xxx would have the advantage of being a verb.

>
> The indentation is appalling, and led me to believe at first that
> hashtable() always returns 0. I had to double back to check. Blech.
>

Read the actual book - preview is available. The html is dumped form the
Open Office book files, unfortunately it doesn't understand code formatting.
I really ought to be able to find a way around that, I know.

>
> There's a reference to a strdup call, but as far as I can see there is
> neither a definition of a strdup function nor indeed a call to one.
> There is a call to a mystrdup function, but that isn't the same thing.
> (Nor is it defined, so we have no idea what it does.)
>

strdup() is discussed in chapter 1.

>
> I wonder what "menas" means.
>

typo.

>
> MM seems to be thinking that "quadratic probing" means "square the hash
> value" (and that's a direct quote). Either he's misunderstood quadratic
> probing completely, or I have. As far as I'm concerned, quadratic
> probing has nothing whatsoever to do with the hash value except as a
> place to start looking for a slot. My understanding is that, if the
> hash value h has so far resulted in n collisions in a table of size s,
> then the place to look next is slot (h + n*n) % s - and most certainly
> NOT (h * h) % s!
>

I'll have to check on this one.

>
> At this point, I started to lose interest, since it seems evident that
> the chapter is so badly broken as to be beyond recovery - and I started
> to skim. At random, I discovered an invasion of implementation
> namespace, a correct claim that the primality test was interesting (but
> alas, it is interesting for all the wrong reasons), an incorrect claim
> that trial division is the only guaranteed way to determine whether a
> number is prime, and a prime number function that wrongly decides 2 is
> not a prime number.
>
> Overall, I found the chapter to be of disappointingly low quality, and I
> could not possibly recommend it. If that chapter is the advertisement,
> the product itself must be dire indeed.
>

You seem determined to pick faults. The problem it then becomes distorting,
like Ben Bacarisse's comments. It is then impossible to know whether the
criticisms actually represent anything legitimate or are just fussing.

abstracted - is an entry of arbitrary size abstract enough? The keys, yes, I
admit it might have been better to go for non-strings.

size_t - yes, there is an obvious case, but you don't seem to have
considered the implications.

malloc(size *ptr) - that's not the convention in most places. The argument
for it is based on the needs of the compiler rather than the human reader.

verbs for function names. There is a case, but it is not obvious that you
are right. C++ doesn't agree.

indentations. Yes, it's my fault for not getting a decent HTML dumper. If I
edit the html

strdup - discussed in chapter 1.

menas - unambiguously valid criticism.

quadratic probing - you might be right here.

prime stuff - yes, that needs a bit of fixing, I will agree.

So actually most of the criticism dissolves. I am not saying that there are
not things you can say against the book, or that it couldn't have been done
better. However can you point to a better introductory web page on hash
tables? That's the real test.

user923005

unread,

Aug 18, 2007, 4:42:36 AM8/18/07

to

On Aug 18, 1:15 am, "Malcolm McLean" <regniz...@btinternet.com> wrote:
[snip]

> So actually most of the criticism dissolves. I am not saying that there are
> not things you can say against the book, or that it couldn't have been done
> better.

I don't understand how the criticism dissolves. I think that the
major issue is really the reaction to the criticism even more so than
the defects themselves.
The worst possible reaction is "The emperor's new clothes" reaction --
pretending that problems do not exist.
The best possible reaction is "Yes, I see the problem. I'll have it
fixed right away." -- and then actually fixing it right away.
Everyone makes mistakes, no matter how careful we are. But
recognizing our mistakes and correcting them is very, very important.
When creating a reference work, special effort should be made to
ensure correctness on the first pass. And diligent effort to improve
it is also highly important.

> However can you point to a better introductory web page on hash
> tables? That's the real test.

I suspect that something in this tangle will turn up worthwhile:
http://www.google.com/search?hl=en&q=introduction+to+hash+tables

Personally, I like this:
http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx
When the twisted minds of Julienne Walker and The EC Team get together
and write a book, I will buy a copy for sure.

Malcolm McLean

unread,

Aug 18, 2007, 5:58:18 AM8/18/07

to

"user923005" <dco...@connx.com> wrote in message
news:1187426556.8...@g4g2000hsf.googlegroups.com...

> On Aug 18, 1:15 am, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> [snip]
>> So actually most of the criticism dissolves. I am not saying that there
>> are
>> not things you can say against the book, or that it couldn't have been
>> done
>> better.
>
> I don't understand how the criticism dissolves. I think that the
> major issue is really the reaction to the criticism even more so than
> the defects themselves.
> The worst possible reaction is "The emperor's new clothes" reaction --
> pretending that problems do not exist.
> The best possible reaction is "Yes, I see the problem. I'll have it
> fixed right away." -- and then actually fixing it right away.
> Everyone makes mistakes, no matter how careful we are. But
> recognizing our mistakes and correcting them is very, very important.
> When creating a reference work, special effort should be made to
> ensure correctness on the first pass. And diligent effort to improve
> it is also highly important.
>

I don't see how else I could react. Criticisms have been made, some of them
indisuptable bug reports, some of them very good points, some of them
defensible but not really good criticisms, some criticisms which fail to
understand the purpose of the code or the book, and some which are
objectively false.
That's what you would expect. However there is also a sort of assuption that
the book is no good. In fact I found a bug in Ben Pfaff's circular buffer
code within a couple of minutes, but no one was demanding that the code be
taken down. Nor did I report it in a silly way, like "this implementation
exhibits catastrophic and resource-hogging behaviour if size_t is used
beyond the range of a typical signed integer type". I just pointed it out,
as any sensible, experienced person does when making bug reports.

>
>> However can you point to a better introductory web page on hash
>> tables? That's the real test.
>
> I suspect that something in this tangle will turn up worthwhile:
> http://www.google.com/search?hl=en&q=introduction+to+hash+tables
>
> Personally, I like this:
> http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx
> When the twisted minds of Julienne Walker and The EC Team get together
> and write a book, I will buy a copy for sure.
>

It is a good description. The writing is rather dry and my car park metaphor
is more interesting. Also I describe how to do deletions with linear
probing. However it is a vey good explanation of hashing, it would be wrong
to say it is not.

The code works on global variables. Whilst this is acceptable for
demonstration code, it also tends to encourage bad programming habits.

This looks like a bug to me

(Quote recommended hash table website)

Insertion and deletion are equally simple relative to linear probing:

1void jsw_insert ( void *key, int len )
2...{
3 unsigned h = hash ( key, len ) % N;
4 unsigned step;
5
6 for ( step = 1; table[h] != NULL; ++step )
7 h = ( h + ( step * step - step ) / 2 ) % N;
8
9 table[h] = key;
10}
1void jsw_remove ( void *key, int len )
2...{
3 unsigned h = hash ( key, len ) % N;
4 unsigned step;
5
6 for ( step = 1; table[h] != NULL; ++step )
7 h = ( h + ( step * step - step ) / 2 ) % N;
8
9 table[h] = DELETED;
10}

You need to call the comparison function and break if equal. As it is the
code marks a null at the end of the cluster as deleted.

There are always problems. As you say, it depends how you react to them.
There are errors and weaknesses in Basic Algorithms, of course. However you
can tell by the tone that mostly these are used as a pretext for envy.

webs...@gmail.com

unread,

Aug 18, 2007, 6:58:20 AM8/18/07

to

On Aug 17, 10:32 am, CBFalconer <cbfalco...@yahoo.com> wrote:
> Richard Harter wrote:

> > Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> >> James Dow Allen wrote:
> >>> [...]
> >>> Speaking of Falconer, his method (linear probing with
> >>> second hash value used for probe increment) *requires*
> >>> (even more so than quadratic probing, see upthread)
> >>> prime table size.
> >>
> >> Well, not exactly: It requires only that the
> >> increment between probes be relatively prime to the table
> >> size. A prime table size makes this criterion easy to
> >> satisfy, but it's not the only way. A table size equal
> >> to a power of two, for example, works just fine if the
> >> probe increment is odd. (Degenerate special case: pure
> >> linear probing with an increment of unity works with any
> >> table size at all, prime or composite.)
>
> > In a thread in comp.programming (IIRCC) Paul Hsieh pointed

> > out [...]

Yeah it was probably me, since I am unable to find a citation from
anyone else that seems to be aware of this. Its as if people think
there is no need to think past Knuth.

> > [...] that always using the same probe increment in a

> > power of two table size also has bad properties. His
> > solution is to use the unused bits of the full hash to
> > select an increment size from a table of primes.
>
> That is dangerous. The purpose of the multiple hash is to
> have different search patterns for different objects that
> have the same primary hash pattern.

No, its to have a different search pattern for different object that
have the same primary hash *INDEX* (i.e., post masking). This is
obvious since, for any good hash function, index collisions will be
many many times more common than full hash function collisions.

Mathematically speaking, on average a single 32-bit (good) hash
function collision will occur roughly when 65536 entries appear, while
a hash index collision will occur roughly when the square root of the
table size number of entries occur (so for a 9 million entry table,
that's just 3000 entries).

> This avoids clustering, and tends to keep
> the search chain short. The very iffy thing in hashlib is
> the reduction of the range of the secondary hash, which
> encourages light clustering in nearly empty tables.

You are ignoring the fact that skip values of 2, 4 and 6, for example,
for indexes starting on even offsets near each other will tend to
collide with each other as the table fills up.

Same skip-value, and commensurate offset collisions must be minimized
by increasing the chance they miss each other. This happens most
easily if the skip values are predominantly co-prime, which is easiest
to construct when they are simply taken from a list of prime numbers.

Now if we imagine a rough worse case scenario of about a 17-sequence
longest chain, then that's at most 16 collisions from other chains. A
sample size of 16 from a random set will cause about 1 collision if
the random set is 256 entries in size. So a prime table of 256, 512
or 1024 entries should be fine (notice how this calculation is
independent of table size). And obviously a quick way of getting at
those tables is through the high bits of a full 32 bit hash function.

> [...] The

> system attempts to maintain tables at between 44 and 88% of
> capacity.
>
> Also, I want the table action to be independent of the hash
> functions (apart from speed), so the functions are computed
> every time needed. The system functions with all the hash
> functions replaced by the constant 1 (unity) (tested in the
> test suite) at a considerable speed penalty. So the action
> is hash function independent, but the speed is highly
> dependent.

I don't understand the point you are making. This is an interesting
robustness test that any well defined hash table should be able to
pass. Its a good idea, but independent of what's being discussed.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Flash Gordon

unread,

Aug 18, 2007, 7:53:40 AM8/18/07

to

Malcolm McLean wrote, On 18/08/07 10:58:

> "user923005" <dco...@connx.com> wrote in message
> news:1187426556.8...@g4g2000hsf.googlegroups.com...
>> On Aug 18, 1:15 am, "Malcolm McLean" <regniz...@btinternet.com> wrote:
>> [snip]
>>> So actually most of the criticism dissolves. I am not saying that
>>> there are
>>> not things you can say against the book, or that it couldn't have
>>> been done
>>> better.
>>
>> I don't understand how the criticism dissolves. I think that the
>> major issue is really the reaction to the criticism even more so than
>> the defects themselves.
>> The worst possible reaction is "The emperor's new clothes" reaction --
>> pretending that problems do not exist.
>> The best possible reaction is "Yes, I see the problem. I'll have it
>> fixed right away." -- and then actually fixing it right away.
>> Everyone makes mistakes, no matter how careful we are. But
>> recognizing our mistakes and correcting them is very, very important.
>> When creating a reference work, special effort should be made to
>> ensure correctness on the first pass. And diligent effort to improve
>> it is also highly important.
>>
> I don't see how else I could react.

Had you started reacting by either fixing the valid criticisms or
publishing an errata for them the reaction to your book would not
steadily get worse.

> Criticisms have been made, some of
> them indisuptable bug reports,

For which you have not published an errata so that your audience know
about them.

> some of them very good points,

For which you have not published an errata so that your audience knowa
about them.

> some of
> them defensible but not really good criticisms,

If it is a defensible criticism then it still needs to be dealt with.

> some criticisms which
> fail to understand the purpose of the code or the book,

It is to present and teach about basic algorithms. As far as I can see
everyone understands that, however as this group is about C rather than
algorithms a significant amount of the criticism has focused on the C
code, and this has been pointed out to you.

> and some which
> are objectively false.

This is no reason to dismiss all the other criticisms and not publish an
errata for them.

> That's what you would expect.

So why does it make you get defensive and start attacking those
criticising it rather than publishing an errata for the things you
accept are either wrong or could be improved?

> However there is also a sort of assuption
> that the book is no good.

No, it is not an assumption, it is an opinion formed by first reading
the material you present and then seeing your reaction to criticism of it.

> In fact I found a bug in Ben Pfaff's circular
> buffer code within a couple of minutes, but no one was demanding that
> the code be taken down.

You noted in your criticism that it was a failure that you would be
unlikely to hit in the real world, since in the real world you would not
be requesting a buffer large enough to hit the problem, and people did
not attack you for criticising it.

> Nor did I report it in a silly way, like "this
> implementation exhibits catastrophic and resource-hogging behaviour if
> size_t is used beyond the range of a typical signed integer type". I
> just pointed it out, as any sensible, experienced person does when
> making bug reports.

In my opinion peoples reaction to you has become more extreme because
you refuse to accept any criticism of anything you post.

>>> However can you point to a better introductory web page on hash
>>> tables? That's the real test.
>>
>> I suspect that something in this tangle will turn up worthwhile:
>> http://www.google.com/search?hl=en&q=introduction+to+hash+tables
>>
>> Personally, I like this:
>> http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx
>> When the twisted minds of Julienne Walker and The EC Team get together
>> and write a book, I will buy a copy for sure.
>>
> It is a good description. The writing is rather dry and my car park
> metaphor is more interesting. Also I describe how to do deletions with
> linear probing. However it is a vey good explanation of hashing, it
> would be wrong to say it is not.
>
> The code works on global variables. Whilst this is acceptable for
> demonstration code, it also tends to encourage bad programming habits.
>
> This looks like a bug to me

A valid criticism, so when will you fix this particular bug in your
stack and queue implementations? You are the one saying using globals is
a bug, so you cannot now claim it isn't.

> (Quote recommended hash table website)
>
> Insertion and deletion are equally simple relative to linear probing:

That is only one of several methods of doing insertion and deletion
presented in the text, and not even the first method of dealing with
clashes that is presented! Before mentioning the linear probing, it even
states there are several methods, and after the linear probing it goes
on to quadratic probing. So your criticism here is misleading at best.

> 1void jsw_insert ( void *key, int len )
> 2...{
> 3 unsigned h = hash ( key, len ) % N;
> 4 unsigned step;
> 5
> 6 for ( step = 1; table[h] != NULL; ++step )
> 7 h = ( h + ( step * step - step ) / 2 ) % N;
> 8
> 9 table[h] = key;
> 10}
> 1void jsw_remove ( void *key, int len )
> 2...{
> 3 unsigned h = hash ( key, len ) % N;
> 4 unsigned step;
> 5
> 6 for ( step = 1; table[h] != NULL; ++step )
> 7 h = ( h + ( step * step - step ) / 2 ) % N;
> 8
> 9 table[h] = DELETED;
> 10}
>
> You need to call the comparison function and break if equal. As it is
> the code marks a null at the end of the cluster as deleted.

Immediately after that code it says, "Keep in mind that the insertion
algorithm must also be modified..." so this is clearly presented as NOT
being a complete implementation of linear probing.

> There are always problems.

Yes, but at least some of your criticisms of that text look like they
were deliberately trying to make it look bad by, for example, saying
that it uses linear probing whilst ignoring that it does this just after
saying there are several methods and when it goes on to quadratic
immediately after.

> As you say, it depends how you react to them.

Also how you react to the criticisms and whether they are valid. You
have just presented invalid criticisms in an apparant attempt to make
your work look better.

> There are errors and weaknesses in Basic Algorithms, of course.

Yet you don't publish an errata for those you know know about and have
admitted.

> However
> you can tell by the tone that mostly these are used as a pretext for envy.

That is your opinion of peoples motives, but it certainly is not my
motive and I don't believe it is the motive of the other criticisms. I
just don't like bad code being presented or bad advice.
--
Flash Gordon

pete

unread,

Aug 18, 2007, 8:25:35 AM8/18/07

to

Flash Gordon wrote:
>
> Malcolm McLean wrote, On 18/08/07 10:58:
> > "user923005" <dco...@connx.com> wrote in message

> > In fact I found a bug in Ben Pfaff's circular
> > buffer code within a couple of minutes,
> > but no one was demanding that the code be taken down.
>
> You noted in your criticism that it was a failure that you would be
> unlikely to hit in the real world,
> since in the real world you would not
> be requesting a buffer large enough to hit the problem, and people did
> not attack you for criticising it.
>
> > Nor did I report it in a silly way, like "this
> > implementation exhibits catastrophic
> > and resource-hogging behaviour if
> > size_t is used beyond the range of a typical signed integer type". I
> > just pointed it out, as any sensible, experienced person does when
> > making bug reports.

I took a look at that,
but I don't hash and I didn't understand the code.

What was it that was being counted
by the size_t type object?

If it was the number of allocations,
then size_t is the wrong type.
The concept of what size_t is supposed to be,
is not related to how many allocated
objects can exist at the same time.
There's nothing in the standard which suggests
that there should be a type which is capable of holding
the number of allocated objects that can exist at one time.
The best type for counting allocations, is the largest unsigned type.

Or was it something else that was being counted
by the size_t type in Ben's circular buffer code?

--
pete