Hashtab 6.0.0.34

0 views

Skip to first unread message

Chris Richard

unread,

Aug 3, 2024, 5:19:45 PM8/3/24

to inmholavxai

tl;drMany feature requests are rejected by R-core because of a maintenance burden, but not hashtab (R>4.2.0). ?hashtab claims to efficiently associate keys with values. Many other implementations (hash, r2r, hashmap, ...) exist, as do environments and user-friendly extensions (rlang, RC, R6, ...) to them. Other than object obfuscation and arbitrary keys, I have not found an obvious use case where hashtab is more efficient than others.

Hash tables are a data structure for efficiently associating keys withvalues. Hash tables are similar to environments, butkeys can be arbitrary objects. Like environments, and unlike namedlists and most other objects in R, hash tables are mutable, i.e., theyare not copied when modified and assignment means just giving anew name to the same object.

New hash tables are created by hashtab. Two variants areavailable: keys can be considered to match if they areidentical() (type = "identical", the default), orif their addresses in memory are equal (type = "address"). Thedefault "identical" type is almost always the right choice.The size argument provides a hint for setting the initialhash table size. The hash table will grow if necessary, but specifyingan expected size can be more efficient.

maphash calls FUN for each entry in the hash table withtwo arguments, the entry key and the entry value. The order in whichthe entries are processed is not predictable. The consequence ofFUN adding entries to the table or deleting entries from thetable is also not predictable, except that removing the entrycurrently being processed will have the desired effect.

External pointer objects are compared as reference objects,corresponding to calling identical() withextptr.as.ref = TRUE. This ensures that hash tables withkeys containing external pointers behave reasonably whenserialized and unserialized.

As an experimental feature, the element operator [[ can also beused get or set hash table entries, and length can be used toobtain the number of entries. It is not yet clear whether this is agood idea.

It seems like there are several logic errors. Break it down piece by piece and make sure it all works. Should other things be inside the first user foreach loop? $hashtab gets overwritten each time through the first loop. $hashtab is then being added to $csvoutput one time at the bottom. What is $count for?

i would like to export all users i find to a csv with choosen columns and values. Also i would like to lookup all computers that belong to the user and add them dynamical with columns and value. I can find the computer by user mail attribute in the computer extension attribute.

In the Linux kernel, the following vulnerability has been resolved: bpf: Fix hashtab overflow check on 32-bit arches The hashtab code relies on roundup_pow_of_two() to compute the number of hash buckets, and contains an overflow check by checking if the resulting value is 0. However, on 32-bit arches, the roundup code itself can overflow by doing a 32-bit left-shift of an unsigned long value, which is undefined behaviour, so it is not guaranteed to truncate neatly. This was triggered by syzbot on the DEVMAP_HASH type, which contains the same check, copied from the hashtab code. So apply the same fix to hashtab, by moving the overflow check to before the roundup.

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations. If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

I was aware and afraid of that. Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode. But as Tomas
pointed out, that's consistent with what's done elsewhere.

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk. That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee). But does allow retiring
explain_sq_limit and explain_parallel_sort_stats. I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:". I guess in
either case, the tests shouldn't be included.