Writable source equals Reusable source

James Dow Allen

unread,

Aug 18, 2005, 4:24:54 AM8/18/05

to

In earlier threads I've tried to espouse a key idea:

Modifying Flexible Source-Code is the Key to Reusability

Perhaps I should rephrase this as

Source = Writable Source = Reusable Source = Force

* * * * * * * * * * * * * * * * *

We need a specific programming task as an example for discussion.

Hash table management has several advantages as such an example:
-- hash tables are ubiquitous (especially in software
optimized for speed).
-- hash tables give a better example than, say, list primitives,
which are not complicated enough for useful discussion or
for important economic return due to reusability
-- hash tables give an easier example than, say, text
presentation, which has too many modes and modalities for
ease of discussion or reusability.
-- in my own experience, coding often involves standard library
routines (eg. random(), qsort()), cuts-and-pastes and
writes-from-scratch. But with hash table management, the
use of flexible reusable software becomes compelling.

As explained in previous threads, the two widely espoused
hash table managers (Gnu hsearch() and Chuck's hashlib)
are often inadequate for my needs. I do cheerfully stipulate
that Chuck's is preferable to Gnu's hsearch().

* * * * * * * * * * * * * * * * *

When I was first exposed to Unix wizards I was impressed
with some of their magical powers. Soon I realized they
were making a copy of standard BSD sources and modifying
them, thereby producing custom software quickly.

The key point is that the object files were not the reusable
components; instead modifying the source files for reuse
was the key for rapid development.

This is in stark contrast to conventional thinking which seeks
a ``full-featured'' routine with unchanging source code.
In my hash-table manager -- HashTab (tm) -- I ``have my cake and
eat it too'' since the changing parts of the routine are isolated
in a header file, and most of the code doesn't change.

* * * * * * * * * * * * * * * * *
Referring to
http://www.recmath.com/contest/fullstandings.php

This advantage was brought home again in the latest Al Zimmermann
Programming Contest, where it is convenient to maintain a record
(hash table) of numbers previously encountered within a grid
under construction. In my solution any hash entry will be in one
of the following states:
* empty
* composite
* composite, pseudo-prime
* prime not occurring in current grid
* prime occurring 1, 2, ..., 32767 times

Primes occurring must be maintained for my algorithm
to be valid, but all other entries can be silently
discarded to avoid table growth (though composite
pseudo-primes should be retained for performance).

If I'd been using either of the two widely espoused
solutions (Gnu hsearch, or the superior Chuck's hash)
I suppose I'd have been puzzled briefly, then finessed
the issue, or written hash-from-scratch. Having been
there before many times, I had my own reusable HashTab
and as usual it was trivial to modify.

I'm sure Chuck would have been happy and competent to hack
his code to support "soft entries", but what if he'd been
on vacation in Tahiti?

* * * * * * * * * * * * * * * * *

Regarding the present Al Zimmermann Programming Contest,
the details of the hash table management may not be of
great importance. An algorithmic approach which can be
inferred from the above discussion *is* important,
and some contestants may have overlooked it.

I am cunningly divulging this idea now, early enough so
that other contestants may choose to modify their algorithms,
but late enough so that such modifications will turn out
to be a net loss of time.

James Dow Allen

CBFalconer

unread,

Aug 18, 2005, 10:55:52 AM8/18/05

to

James Dow Allen wrote:
>
... snip ...

>
> As explained in previous threads, the two widely espoused
> hash table managers (Gnu hsearch() and Chuck's hashlib)
> are often inadequate for my needs. I do cheerfully stipulate
> that Chuck's is preferable to Gnu's hsearch().

Thanks for the kind words. The package is still available at:

<http://cbfalconer.home.att.net/download/hashlib.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

CBFalconer

unread,

Aug 18, 2005, 11:00:51 AM8/18/05

to

James Dow Allen wrote:
>
... snip ....

>
> As explained in previous threads, the two widely espoused
> hash table managers (Gnu hsearch() and Chuck's hashlib)
> are often inadequate for my needs. I do cheerfully stipulate
> that Chuck's is preferable to Gnu's hsearch().

Thanks for the kind words. The package is still available at:

webs...@gmail.com

unread,

Aug 19, 2005, 2:02:02 PM8/19/05

to

James Dow Allen wrote:
> In earlier threads I've tried to espouse a key idea:
>
> Modifying Flexible Source-Code is the Key to Reusability
>
> Perhaps I should rephrase this as
>
> Source = Writable Source = Reusable Source = Force

> [...] snip

Ok, but it sounds like you are generalizing a lot from a single
example.

1) Are you sure that your solution of simply mutating some old source
code is the best possible example for what you were doing? Or is it
just the best you happen to have on hand?

2) Is solving some programming contest comparable to solving real world
programming problems? (Where code might need to be documented,
maintained, or fit some other criteria.)

3) Can you generalize this strategy to other programming structures?
If so how scalable is the idea of having "demo source" lying around
that you just customize as required to all programming problems?

4) Does your approach make unwarranted assumptions about programmer
skill level? If someone wanted to use your custom hash table sources
to make their own, for example, what kind of learning curve would they
have to go through?

5) Let me propose a very simple problem for you. I want to implement a
cookie database for my webbrowser. The cookies will come in the form
of the following 4-tuple: (file, host, key, value). So what I want is
a sequence of nested hash tables that indexed first by the filename,
then the host, then the key, which will finally yield the value to me.
Does your hash table design, which as I recall was based on static
instances, scale to solve a problem like this? Note that this is not 3
hash tables -- its 3 *LEVELS* of hash tables. (I think CBFalconer's
can do this, so long as you have fewer than about 8 million of any
given index.)

> When I was first exposed to Unix wizards I was impressed
> with some of their magical powers. Soon I realized they
> were making a copy of standard BSD sources and modifying
> them, thereby producing custom software quickly.

This, BTW, is how the thoroughly obsolete System V malloc libraries
temporarily made it into the Linux kernel before being summarily
expunged (and one of the discredited examples of "copying" in the SCO
versus IBM case.)

> The key point is that the object files were not the reusable
> components; instead modifying the source files for reuse
> was the key for rapid development.

This is a very old style C-centric way of thinking. C++'s STL, for
example, does not primarily present static .obj files as part of its
base interface. Very dynamic data structure description is quite
feasible because of the fundamentally source-level template system that
the STL is. Even in C, you can always use macros as a direct
substitute for this exact same thing.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

CBFalconer

unread,

Aug 19, 2005, 2:42:19 PM8/19/05

to

webs...@gmail.com wrote:
>
... snip ...

>
> 5) Let me propose a very simple problem for you. I want to implement a
> cookie database for my webbrowser. The cookies will come in the form
> of the following 4-tuple: (file, host, key, value). So what I want is
> a sequence of nested hash tables that indexed first by the filename,
> then the host, then the key, which will finally yield the value to me.
> Does your hash table design, which as I recall was based on static
> instances, scale to solve a problem like this? Note that this is not 3
> hash tables -- its 3 *LEVELS* of hash tables. (I think CBFalconer's
> can do this, so long as you have fewer than about 8 million of any
> given index.)

Hashlib can easily change that limit in one place in the source.
Since a record will rarely be appreciably less that 20 odd bytes,
that has already allowed for about 160 Mb of storage. Larger
values are very likely to start using virtual memory, and thrash
when the content becomes large enough, with quite serious
degradation of performance. This would not be pleasant when used
for eg. a compilers symbol tables.

Randy Howard

unread,

Aug 19, 2005, 6:34:04 PM8/19/05

to

CBFalconer wrote
(in article <430624FD...@yahoo.com>):

> Hashlib can easily change that limit in one place in the source.
> Since a record will rarely be appreciably less that 20 odd bytes,
> that has already allowed for about 160 Mb of storage.

Which isn't all that much by 2005 standards really.

> Larger
> values are very likely to start using virtual memory, and thrash
> when the content becomes large enough, with quite serious
> degradation of performance.

This obviously depends a great deal about the target solution.
If it is for mass consumption, then worrying about memory space
(while the rest of the industry is gobbling it up like mad) may
make some sense. But if you know the app will be hosted on a
modern system, especially now that even swapping pain can be
minimized by large RAID arrays, then it's not a problem. You
can buy a production system with 4GB of memory today for less
than the price of a computer that I had to hand solder every
cap, resistor, memory socket, etc. on with a max of 64K of RAM
back in the day. It's also conceivable to have an SMP and/or
dual-core based server with 64GB of memory for quite a bit more
money, but when you need it, it's available. 160Meg doesn't
sound like much stacked up against that. :-)

--
Randy Howard (2reply remove FOOBAR)