Structure as key containing char *

112 views
Skip to first unread message

Jon Scobie

unread,
Jul 3, 2018, 12:11:25 PM7/3/18
to uth...@googlegroups.com
I was wondering if there is an example which has a structure as the key and that structure contains items of char pointers?

viz.

typedef struct {
    int id;
    char *name;
    int something;
    char *sometext;
} key_t;

typedef struct {
    key_t key;
    char *value;
} map_t;


Do I need to construct a separate key as a char* and then use HASH_ADD_PTR?

Thanks,

Jon.


----
The information contained in this communication is private and confidential and may contain privileged material. It is intended solely for use by the recipient(s). Copying, distributing, disclosing or using any of the information in it or any attachments is strictly prohibited and may be unlawful.

Arthur O'Dwyer

unread,
Jul 3, 2018, 12:34:10 PM7/3/18
to uth...@googlegroups.com
On Tue, Jul 3, 2018 at 3:47 AM, 'Jon Scobie' via uthash <uth...@googlegroups.com> wrote:
I was wondering if there is an example which has a structure as the key and that structure contains items of char pointers?

viz.

typedef struct {
    int id;
    char *name;
    int something;
    char *sometext;
} key_t;

typedef struct {
    key_t key;
    char *value;
} map_t;

Do I need to construct a separate key as a char* and then use HASH_ADD_PTR?

So you want your hash to store elements of type `map_t`, and you want them to be hashed based on the value of field `key`?
This is tricky in C because C has no concept of "the value of."  When uthash hashes a key field, it isn't looking at "the value of" the field; it's actually just feeding a range of raw bytes into the selected hash function. This means that if you use `key` as your hash key,
- uthash will hash the WHOLE `key_t`, including the 4 garbage padding bytes after `id` and the 4 garbage padding bytes after `something`;
- uthash will hash the BYTES of pointers such as `name`, not the null-terminated strings they point to.
There is no easy way to tell uthash "Hey, when you hash this object, I want you to hash these 4 bytes, then skip 4 bytes, then treat these 8 as a pointer to a null-terminated string and hash that whole thing, then return to where you left off and hash those 4 bytes normally, then skip another 4..."  C++ is great at that. C macros, not so much.

The "uthash-thonic" way to do it would be to add an `unsigned hash;` field to your `key_t`, manually keep it up to date e.g. with

    unsigned key_recompute_hash(key_t *key) {
        unsigned result = 0;
        unsigned hashv = 0;
        HASH_JEN(&key->id, sizeof key->id, hashv); result += hashv;
        HASH_JEN(key->name, strlen(key->name), hashv); result += hashv;
        HASH_JEN(&key->something, sizeof key->something, hashv); result += hashv;
        HASH_JEN(key->sometext, strlen(key->sometext), hashv); result += hashv;
        return result;
    }

    void key_set_id(key_t *k, int id) {
        k->id = id;
        k->hash = key_recompute_hash(k);
    }

and then you can just use the subexpression `key.hash` as your hash key.

Another way to do it (although you might have to be cautious with this one) is to keep using `key` as your hash key, but change the hashing algorithm.

    #define HASH_FUNCTION(key, keylen, hashv) do { (hashv) = key_recompute_hash((key_t *)(key)); } while (0)
    #include <uthash.h>

uthash will observe that you've set a customized HASH_FUNCTION and use it for all hashing operations.
Of course you need to be careful with this one that you never perform any uthash operations, anywhere in this translation unit, on objects that are not of type `key_t`!

HTH,
Arthur

jon.s...@callsign.com

unread,
Jul 5, 2018, 4:31:56 AM7/5/18
to uthash
Arthur, thanks for the information. That almost worked but I had to make a modification to the comparison in HASH_FIND_IN_BKT. I don't actually know why there is a memcmp in there in the first place as if the hash and keylen match, isn't that enough?
A memcmp on a structure which contains pointers is never going to work.

Here is the patch and it works for me but obviously there might be a better way of doing this.

--- uthash.h.org 2018-07-05 09:29:02.124786760 +0100
+++ uthash.h 2018-07-05 09:26:49.339549754 +0100
@@ -833,9 +833,7 @@
} \
while ((out) != NULL) { \
if ((out)->hh.hashv == (hashval) && (out)->hh.keylen == (keylen_in)) { \
- if (uthash_memcmp((out)->hh.key, keyptr, keylen_in) == 0) { \
break; \
- } \
} \
if ((out)->hh.hh_next != NULL) { \
DECLTYPE_ASSIGN(out, ELMT_FROM_HH(tbl, (out)->hh.hh_next)); \


Jon.
--

Arthur O'Dwyer

unread,
Jul 5, 2018, 1:43:36 PM7/5/18
to uth...@googlegroups.com
On Thu, Jul 5, 2018 at 1:31 AM, jon.scobie via uthash <uth...@googlegroups.com> wrote:
On Tuesday, 3 July 2018 17:34:10 UTC+1, arthur.j.odwyer  wrote:
> On Tue, Jul 3, 2018 at 3:47 AM, 'Jon Scobie' via uthash <uth...@googlegroups.com> wrote:
>
> typedef struct {
>     int id;
>     char *name;
>     int something;
>     char *sometext;
> } key_t;
>
> typedef struct {
>     key_t key;
>     char *value;
> } map_t;
>
> Do I need to construct a separate key as a char* and then use HASH_ADD_PTR?
>
[...]

> Another way to do it (although you might have to be cautious with this one) is to keep using `key` as your hash key, but change the hashing algorithm.
>
>
>     #define HASH_FUNCTION(key, keylen, hashv) do { (hashv) = key_recompute_hash((key_t *)(key)); } while (0)
>     #include <uthash.h>
>
>
> uthash will observe that you've set a customized HASH_FUNCTION and use it for all hashing operations.
> Of course you need to be careful with this one that you never perform any uthash operations, anywhere in this translation unit, on objects that are not of type `key_t`!
 
Arthur, thanks for the information. That almost worked but I had to make a modification to the comparison in HASH_FIND_IN_BKT. I don't actually know why there is a memcmp in there in the first place as if the hash and keylen match, isn't that enough?

Ah, you're right (about the patch being needed).
Semi-fortunately, "uthash_memcmp" is a customization point just like "HASH_FUNCTION", so you can write
    g++ mything.c -DHASH_FUNCTION=key_hash -Duthash_memcmp=key_compare
and it will work. This is only "semi-fortunate" because the real rationale for uthash_memcmp is "portability to systems without memcmp in their libc"; we had not previously considered the idea of making it a customization point for key comparisons. But it's true that uthash_memcmp is only ever used to compare keys for equality! It should probably be renamed in the next version, I guess. I'll open an issue for discussion.

You're wrong about the redundancy of key comparison, though.
- Consider the case of a hash table with two buckets, containing three elements. By the pigeonhole principle, two of those elements must hash into the same bucket.
- "But we don't compare buckets, we compare hashes! A hash is 32 bits!"
- Consider the case of a hash table with N buckets, containing 4-billion-and-1 elements. By the pigeonhole principle, two of those elements must hash to the same 32-bit hash value.

For example, the strings "antiseption" and "Fletcherite" have the same HASH_JEN (namely 0x8c02a591), and the strings "diversionary" and "propenseness" have the same HASH_FNV (namely 0x091c4808).

As usual, if I'd just thought about what C++ does, it would have been obvious that you can't build a customized hash table just by customizing the uthash-equivalent-of-std::hash<T>. You naturally must also customize the uthash-equivalent-of-std::equal_to<T> — which at the moment is spelled "uthash_memcmp".

HTH,
Arthur

jon.s...@callsign.com

unread,
Jul 5, 2018, 1:51:54 PM7/5/18
to uthash
Ah. Absolutely. I knew it wouldn't be there just for the fun of it and thanks for the explanation. All makes perfect sense. I will implement an override instead and agree with the rename. Might be good to include another test for complex structure keys with pointers. Willing to do that if you want and many thanks for the great help.
Reply all
Reply to author
Forward
0 new messages