Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tcl with "unsigned int"

95 views
Skip to first unread message

aotto1968

unread,
Aug 22, 2022, 4:20:54 AM8/22/22
to

Down is a hash code from the TCL source,
I want to use the "unsigned int" as my own hash for internal purpose so I tried to translate this in tcl

proc pHash { str } {
set result 0
if {[string length $str] > 0} {
scan [string index $str 0] %c result
Print str@30 result
for {set idx 1} {$idx < [string length $str]} {incr idx} {
scan [string index $str $idx] %c value
set result [expr {$result + ($result << 3) + $value}]
Print idx result
}
}
format %08X $result
}

>> the result is frustrating <<

pHash SrvMkBufferCreateTLS

pHash -> str<SrvMkBufferCreateTLS>, | result<83>,
pHash -> idx<1>, | result<861>,
pHash -> idx<2>, | result<7867>,
pHash -> idx<3>, | result<70880>,
pHash -> idx<4>, | result<638027>,
pHash -> idx<5>, | result<5742309>,
pHash -> idx<6>, | result<51680898>,
pHash -> idx<7>, | result<465128184>,
pHash -> idx<8>, | result<4186153758>,
pHash -> idx<9>, | result<37675383923>,
pHash -> idx<10>, | result<339078455421>,
pHash -> idx<11>, | result<3051706098856>,
pHash -> idx<12>, | result<27465354889818>,
pHash -> idx<13>, | result<247188194008463>,
pHash -> idx<14>, | result<2224693746076264>,
pHash -> idx<15>, | result<20022243714686492>,
pHash -> idx<16>, | result<180200193432178529>,
pHash -> idx<17>, | result<1621801740889606845>,
pHash -> idx<18>, | result<14596215668006461681>,
pHash -> idx<19>, | result<131365941012058155212>,
1F11935808A528CC

The CORE problem is that in "C" the result is limited to 32bit (unsigned in) and an ADD to a 32bit integer
with overrun of the upper border will result in start at "0". the number will ALWAYS be restricted to 32 bit.

→ *but* in TCL the result is always added and *not* restricted to 32 bit.

Question how I add REAL 32bit unsinged integer in TCL

mfg



unsigned int
TclHashObjKey(
Tcl_HashTable *tablePtr, /* Hash table. */
void *keyPtr) /* Key from which to compute hash value. */
{
Tcl_Obj *objPtr = keyPtr;
int length;
const char *string = TclGetStringFromObj(objPtr, &length);
unsigned int result = 0;

/*
* I tried a zillion different hash functions and asked many other people
* for advice. Many people had their own favorite functions, all
* different, but no-one had much idea why they were good ones. I chose
* the one below (multiply by 9 and add new character) because of the
* following reasons:
*
* 1. Multiplying by 10 is perfect for keys that are decimal strings, and
* multiplying by 9 is just about as good.
* 2. Times-9 is (shift-left-3) plus (old). This means that each
* character's bits hang around in the low-order bits of the hash value
* for ever, plus they spread fairly rapidly up to the high-order bits
* to fill out the hash value. This seems works well both for decimal
* and non-decimal strings.
*
* Note that this function is very weak against malicious strings; it's
* very easy to generate multiple keys that have the same hashcode. On the
* other hand, that hardly ever actually occurs and this function *is*
* very cheap, even by comparison with industry-standard hashes like FNV.
* If real strength of hash is required though, use a custom hash based on
* Bob Jenkins's lookup3(), but be aware that it's significantly slower.
* Tcl does not use that level of strength because it typically does not
* need it (and some of the aspects of that strength are genuinely
* unnecessary given the rest of Tcl's hash machinery, and the fact that
* we do not either transfer hashes to another machine, use them as a true
* substitute for equality, or attempt to minimize work in rebuilding the
* hash table).
*
* See also HashStringKey in tclHash.c.
* See also HashString in tclLiteral.c.
*
* See [tcl-Feature Request #2958832]
*/

if (length > 0) {
result = UCHAR(*string);
while (--length) {
result += (result << 3) + UCHAR(*++string);
}
}
return result;
}

Ralf Fassel

unread,
Aug 22, 2022, 4:58:16 AM8/22/22
to
* aotto1968 <aott...@t-online.de>
| The CORE problem is that in "C" the result is limited to 32bit (unsigned in) and an ADD to a 32bit integer
| with overrun of the upper border will result in start at "0". the number will ALWAYS be restricted to 32 bit.
>
| → *but* in TCL the result is always added and *not* restricted to 32 bit.
>
| Question how I add REAL 32bit unsinged integer in TCL

I'd say you need an "& 0xffffffff" somewhere in your TCL expr code to
limit to 32bits. Finding the correct position for that is left as an
exercise ;-)

R'

aotto1968

unread,
Aug 22, 2022, 5:09:11 AM8/22/22
to
I choose

> set result [expr {($result + ($result << 3) + $value) & 0xffffffff}]

and hope it will be fine.

Ralf Fassel

unread,
Aug 22, 2022, 10:51:09 AM8/22/22
to
* aotto1968 <aott...@t-online.de>
| > I'd say you need an "& 0xffffffff" somewhere in your TCL expr code to
| > limit to 32bits. Finding the correct position for that is left as an
| > exercise ;-)
| >
| I choose
>
| > set result [expr {($result + ($result << 3) + $value) & 0xffffffff}]
>
| and hope it will be fine.

Note that TCLs "scan %c" does something different for chars > 127
(notably Unicode) than the C function, since in TCL you get _one_ value
(which is also probably larger than 256), while in C you get two or more
bytes, depending on the Unicode representation. If you need an exact
replica of the C function, you probably need to get at the individual
bytes on TCL level.

Also I'm not sure if you need another "&..." after the left-shift, since
in C "$result << 3" will probably also overflow. Don't know if this
matters for the final result or the result as a hash value (been a while
since I did bitshifting).

R'
0 new messages