HASH changes

Juergen Boemmels

unread,

Nov 6, 2003, 7:20:17 AM11/6/03

to Perl6 Internals

Hi,

the latest changes of HASH in src/hash.c makes t/src/hash.t fail
badly. It did not even compile because of the prototype change in
hash_new. Ok that was easy to fix and I have already a fix in my tree,
which at least compiles cleanly.

But I still have one problem: hash_put and hash_get are not
symmetric. hash_put puts a void *value, but hash_get returns a
HASHBUCKET. This looks wrong to me. I have no problem with value
being a void* but hash_get should then return also a void *. If its
really needed to get a bucket it should be done with a function
hash_get_bucket. (But this really feels like exposing an
implementation detail).

Comments
boe
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Leopold Toetsch

unread,

Nov 6, 2003, 7:30:03 AM11/6/03

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels <boem...@physik.uni-kl.de> wrote:
> Hi,

> the latest changes of HASH in src/hash.c makes t/src/hash.t fail
> badly.

Oops, sorry (but at least it isn't the only broken test currently :)

> But I still have one problem: hash_put and hash_get are not
> symmetric. hash_put puts a void *value, but hash_get returns a
> HASHBUCKET. This looks wrong to me.

When returning the value directly, we couldn't have NULL value's - or
better NULL values and no entry at all would have the same return
result.

Additionally the returned bucket has the key inside, which could be
useful.

> boe

leo

Juergen Boemmels

unread,

Nov 6, 2003, 1:05:33 PM11/6/03

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Juergen Boemmels <boem...@physik.uni-kl.de> wrote:
> > Hi,
>
> > the latest changes of HASH in src/hash.c makes t/src/hash.t fail
> > badly.
>
> Oops, sorry (but at least it isn't the only broken test currently :)

I committed a partial fix. Now the tests are at least compiling.

> > But I still have one problem: hash_put and hash_get are not
> > symmetric. hash_put puts a void *value, but hash_get returns a
> > HASHBUCKET. This looks wrong to me.
>
> When returning the value directly, we couldn't have NULL value's - or
> better NULL values and no entry at all would have the same return
> result.

Yes this is a problem. I did a little research how other libraries
solve this problem. Oh man, there are many of them. Far the most
libraries return the value directly [1], some of them use a function
to check if the key exists and set the value part as a side effect [2]
and a third group uses an approach were creation is a side effect of
lookup (sometimes controlled with a flag) and always return an
key/value pair [3]. Glib for example use approaches [1] and [2].

Our current approach looks like a mixture of all these 3 cases.

From the principle of least surprise, I suggest to let hash_get return
a value pointer directly and have the ambiguity of NULL value and no
entry, and have a hash_get_bucket which returns a pointer to the
bucket.

void * hash_get(Interp * interpreter, HASH *hash, void *key);
HASHBUCKET * hash_get_bucket(Interp * interpreter, HASH *hash, void *key);

Some other note. Didn't we decided some time ago to use only ALLCAPS
types in cases were they are abbreviations. So the name would better
be Hash and HashBucket.

> Additionally the returned bucket has the key inside, which could be
> useful.

In cases where two keys compare equal but are not equal bitwise this
is indeed nice.

bye
boe

[1]
http://oss.software.ibm.com/cvs/icu/icu/source/common/uhash.h?rev=1.24
http://developer.gnome.org/doc/API/2.0/glib/glib-Hash-Tables.html
http://apr.apache.org/docs/apr/group__apr__hash.html
http://www.w3.org/Library/src/HTHash.html
http://mission.base.com/peter/source/pbl/doc/
http://judy.sourceforge.net/JudySL_3x.htm
http://www.ipd.bth.se/ska/sim_home/libghthash_mk2_doc/ght__hash__table_8h.html
http://dodgysoftware.net/ht/
http://chilli.degnet.de/~e_decker/hashtable/

[2]
http://developer.gnome.org/doc/API/2.0/glib/glib-Hash-Tables.html
http://www-2.cs.cmu.edu/~bwolen/fmcad98/packages/tiger/tgrlib/utilhash.html
http://www.ohse.de/uwe/strhash/libstrhash.html
http://www.stalphonsos.com/~attila/libgram/docs/hash_8h.html

[3]
http://www.gnu.org/manual/bfd-2.9.1/html_node/bfd_43.html
http://www.gnu.org/software/libc/manual/html_node/Hash-Search-Function.html
http://www.tcl.tk/man/tcl8.2.3/TclLib/Hash.htm

Jeff Clites

unread,

Nov 6, 2003, 1:48:45 PM11/6/03

to Juergen Boemmels, perl6-i...@perl.org, l...@toetsch.at

On Nov 6, 2003, at 10:05 AM, Juergen Boemmels wrote:

>>> But I still have one problem: hash_put and hash_get are not
>>> symmetric. hash_put puts a void *value, but hash_get returns a
>>> HASHBUCKET. This looks wrong to me.
>>
>> When returning the value directly, we couldn't have NULL value's - or
>> better NULL values and no entry at all would have the same return
>> result.

...

> From the principle of least surprise, I suggest to let hash_get return
> a value pointer directly and have the ambiguity of NULL value and no
> entry, and have a hash_get_bucket which returns a pointer to the
> bucket.
>
> void * hash_get(Interp * interpreter, HASH *hash, void *key);
> HASHBUCKET * hash_get_bucket(Interp * interpreter, HASH *hash, void
> *key);

And to amplify what you are saying, I think it's okay for hash_get to
be ambiguous as to whether or not the key exists, as long as there is a
separate method for that--hash_key_exists or something. (I can't think
of a case in user code where you'd need to do both in one call--in
perl5 for example "exists" is a separate operator from hash lookup.)

An alternative (which may or may not a good one) would be to say that
you can never actually store a NULL in a hash--at most you could store
a PerlUndef or Null or whatever null-ish PMC. Then a NULL return
unambiguously means that the key wasn't found. I could probably argue
either for or against this.

JEff

Leopold Toetsch

unread,

Nov 7, 2003, 2:53:27 AM11/7/03

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels <boem...@physik.uni-kl.de> wrote:

> Yes this is a problem. I did a little research how other libraries
> solve this problem. Oh man, there are many of them.

"little" :)

BTW, inside parrot:

$ find . -name '*.c' | xargs grep hash_get

and you'll see another hash implementation.

> void * hash_get(Interp * interpreter, HASH *hash, void *key);
> HASHBUCKET * hash_get_bucket(Interp * interpreter, HASH *hash, void *key);

That's ok for me. For all of the "normal" Hash usage, we don't have NULL
values, so there is no ambuigity for the return value.

> Some other note. Didn't we decided some time ago to use only ALLCAPS
> types in cases were they are abbreviations. So the name would better
> be Hash and HashBucket.

Yep.

>> Additionally the returned bucket has the key inside, which could be
>> useful.

> In cases where two keys compare equal but are not equal bitwise this
> is indeed nice.

Yes, you could have e.g. case-ignore compare/hash functions.

> bye
> boe

leo

Leopold Toetsch

unread,

Nov 7, 2003, 2:56:00 AM11/7/03

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

> An alternative (which may or may not a good one) would be to say that
> you can never actually store a NULL in a hash

Not good. The Hash (now) may take almost arbitrary key/value pairs.
Value can be plain ints ...

> JEff

leo

Juergen Boemmels

unread,

Nov 7, 2003, 8:21:17 AM11/7/03

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> BTW, inside parrot:
>
> $ find . -name '*.c' | xargs grep hash_get
>
> and you'll see another hash implementation.

The disadvantage of this finds is that you get many generated files
like core_ops_cg.c or classes/perlhash.c. To solve this problem,
I have written some little helper script:

manigrep.pl

It just takes the MANIFEST and walks all files in it searching for a
match. It can match fixed strings or /regular expressions/ and in
contrast to the normal grep it also adds the line numbers.

Do you think this tool should enter tools/dev ?
bö

manigrep.pl

Juergen Boemmels

unread,

Nov 7, 2003, 10:10:57 AM11/7/03

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

[...]

> > void * hash_get(Interp * interpreter, HASH *hash, void *key);
> > HASHBUCKET * hash_get_bucket(Interp * interpreter, HASH *hash, void *key);
>
> That's ok for me. For all of the "normal" Hash usage, we don't have NULL
> values, so there is no ambuigity for the return value.

Attached patch implements this. Furthermore it implement hash_exists.
It does not break more tests than previously.

Is this ok to commit?
bö

hash.diff

Leopold Toetsch

unread,

Nov 7, 2003, 8:55:38 AM11/7/03

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels wrote:

>
> manigrep.pl

>
> Do you think this tool should enter tools/dev ?

Proabably yes. OTOH, syntax should better be like grep and I'm really
often using -i or/and -w switches.

> bö

leo

Leopold Toetsch

unread,

Nov 7, 2003, 10:22:20 AM11/7/03

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels <boem...@physik.uni-kl.de> wrote:

> Attached patch implements this. Furthermore it implement hash_exists.
> It does not break more tests than previously.

> Is this ok to commit?

Yes for me. While touching these files, you could rename HASH{,BUCKET}
too to ucfirst lc $_.

Thanks,
leo

Uri Guttman

unread,

Nov 7, 2003, 11:21:09 AM11/7/03

to Juergen Boemmels, perl6-i...@perl.org

>>>>> "JB" == Juergen Boemmels <boem...@physik.uni-kl.de> writes:

>> $ find . -name '*.c' | xargs grep hash_get

JB> The disadvantage of this finds is that you get many generated
JB> files like core_ops_cg.c or classes/perlhash.c. To solve this
JB> problem, I have written some little helper script:

JB> manigrep.pl

JB> It just takes the MANIFEST and walks all files in it searching for
JB> a match. It can match fixed strings or /regular expressions/ and
JB> in contrast to the normal grep it also adds the line numbers.

JB> Do you think this tool should enter tools/dev ? bö

i wrote something that does the same thing but under Module::Build. but
as parrot is still using Makefile.PL it won't be directly useful. i also
wrote my own manifest slurp support since the one supplied with perl (i
forget which module has it) reads the manifest into a hash and loses any
file ordering. i use this manifest search to run my tests in the order
found in the manifest. this makes it easier than naming tests with
numeric prefixes, etc.

is there any thought about moving over to module::build? it is going
into perl5 core in 5.10 i believe. and there is a makefile.pl
compatibilty call-through mode as well.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Jeff Clites

unread,

Nov 9, 2003, 9:44:43 PM11/9/03

to l...@toetsch.at, perl6-i...@perl.org

Okay, that makes sense then. So is the current implementation
incomplete? I ask because it looks like the mark functionality won't
yet quite deal with non-PObj values, etc. Also, since key and value are
now void*'s, there'll be a problem on platforms where sizeof(void *) !=
sizeof(other types), right? (In particular, doubles will usually be
larger than pointers.)

JEff

Leopold Toetsch

unread,

Nov 10, 2003, 3:34:28 AM11/10/03

to Jeff Clites, perl6-i...@perl.org

Jeff Clites wrote:

> Okay, that makes sense then. So is the current implementation
> incomplete? I ask because it looks like the mark functionality won't yet
> quite deal with non-PObj values, etc.

The mark_key (and hash, compare) functions can be passed to the extended
new_hash_x() creation function. If mark_key is NULL keys aren't marked.
Values are only marked if the value entries are PMCs or STRINGs.

> ...Also, since key and value are now

> void*'s, there'll be a problem on platforms where sizeof(void *) !=
> sizeof(other types), right? (In particular, doubles will usually be
> larger than pointers.)

No. This isn't finished yet, but another creation param can set the size
of the value store (if we ever need that ...)

>
> JEff

leo