XOR two 31-bit unsigned integers much faster than XOR two 32-bit unsigned integers?

393 views
Skip to first unread message

Joran Greef

unread,
May 30, 2012, 10:31:15 AM5/30/12
to v8-u...@googlegroups.com
I am implementing a table hash (http://en.wikipedia.org/wiki/Tabulation_hashing) and noticed that a table hash using a table of 31-bit unsigned integers is almost an order of magnitude faster than a table hash using a table of 32-bit unsigned integers.

The former has an average hash time of 0.00007ms per 20 byte key for a 31-bit hash, and the latter has an average hash time of 0.00034ms per 20 byte key for a 32-bit hash.

I figured that XOR on 8-bit integers would be faster than XOR on 16-bit integers would be faster than XOR on 24-bit integers would be faster than XOR on 32-bit integers, but did not anticipate such a difference between 31-bit and 32-bit integers.

Is there something regarding XOR that I may be missing that could explain the difference?


Andreas Rossberg

unread,
May 30, 2012, 10:39:40 AM5/30/12
to v8-u...@googlegroups.com
On 30 May 2012 16:31, Joran Greef <jo...@ronomon.com> wrote:
> I figured that XOR on 8-bit integers would be faster than XOR on 16-bit
> integers would be faster than XOR on 24-bit integers would be faster than
> XOR on 32-bit integers, but did not anticipate such a difference between
> 31-bit and 32-bit integers.

V8, like many other GCed languages, uses a tagged integer
representation. Small integers can be unboxed into a 32-bit word, but
1 bit is reserved for the tag bit. For 32 bit wide integers, you need
the boxed representation.

/Andreas

Jakob Kummerow

unread,
May 30, 2012, 10:40:29 AM5/30/12
to v8-u...@googlegroups.com
As long as you're running unoptimized code on a 32-bit V8, this is expected: 31-bit integers are stored directly as "small integers", the 32nd bit is used to tag them as such, whereas 32-bit integers are converted to doubles and stored as objects on the heap, which makes accessing them more expensive.

When your code runs long enough for the optimizer to kick in, it should recognize this situation, use untagged 32-bit integer values in optimized code, and the difference between 31-bit and 32-bit values should go away. If it doesn't, please post a reduced test case that exhibits the behavior so that we can investigate. (Running the code for a second or so should be enough to get the full effect of optimization and make the initial difference negligible.)

Vyacheslav Egorov

unread,
May 30, 2012, 11:00:55 AM5/30/12
to v8-u...@googlegroups.com
Minor correction: 31-bit tagged _signed_ integers are used on ia32, on
x64 you get 32-bit tagged _signed_ integers. Neither though are wide
enough to contain all values from unsigned 32-bit integer range.

Thus if you are really using them as 32bit _unsigned_ integers, e.g.
you are doing something like:

var a = (b ^ c) >>> 0; // force into uint32 and then use in
non-int32-truncating manner.

then unfortunately even V8's optimizing compiler gets confused. It
does not have designated uint32 representation and does not try to
infer when int32 can be safely used instead of uint32 (another example
of this bug: http://code.google.com/p/v8/issues/detail?id=2097).

I suggest you post your code here if possible so that we could take a look.

--
Vyacheslav Egorov
> --
> v8-users mailing list
> v8-u...@googlegroups.com
> http://groups.google.com/group/v8-users

Joran Greef

unread,
May 30, 2012, 11:21:18 AM5/30/12
to v8-u...@googlegroups.com
The difference comes through when changing "integer % Math.pow(2,32)" to "integer % Math.pow(2,31)" when pre-generating the hash tables.

Hash tables containing 256 random 31-bit unsigned integers are pre-generated for every byte of key. The hash operates on fixed-length 20 byte keys.

Each byte of the key is XOR-red with one of the integers in the table, depending on the position of the byte in the key, so the XOR is dealing with an 8-bit unsigned integer and a 31-bit unsigned integer.

The result is cast to an unsigned integer by >>> 0.

Vyacheslav Egorov

unread,
May 30, 2012, 12:25:39 PM5/30/12
to v8-u...@googlegroups.com
Is it essential that your hash should be uint32 not int32?

I would assume you can get a good performance and 32 bit of hash if
you stay with int32's:

a) fill your tables with (integer % Math.pow(2,32)) | 0 to force
uint32 into int32 and
b) make your table a typed array instead of normal array: new
Int32Array(256); [not necessary if you are running x64 version of V8]
c) stop doing >>> 0 at the end;

--
Vyacheslav Egorov

Stephan Beal

unread,
May 30, 2012, 12:34:34 PM5/30/12
to v8-u...@googlegroups.com
On Wed, May 30, 2012 at 6:25 PM, Vyacheslav Egorov <veg...@chromium.org> wrote:
c) stop doing >>> 0 at the end;

A side question: what does >>>0 do? i have never seen the >>> operator before this thread and never seen a 0-bit shift anywhere.

:-?

--
----- stephan beal
http://wanderinghorse.net/home/stephan/

Joran Greef

unread,
May 30, 2012, 12:50:25 PM5/30/12
to v8-u...@googlegroups.com
Vyacheslav, thanks for your suggestions. I tried them out and together over 200 random keys each with 100,000 iterations, it's about 0.000048ms per key (hash output is 32-bit signed).

Ultimately the hash result is being used as an index into a hash table so it needs to be cast to unsigned integer at some point. If I do a) and b) but then cast it, it's about 0.00022ms per key (hash output is 32-bit unsigned).

If I use a Uint32Array or normal array, with 31-bit unsigned integers as hashing table constants, and then cast the result to unsigned, it's about 0.000048ms per key (hash output is 31-bit unsigned).

Is there something obvious I'm missing to get an index into a hash table bucket using a 32-bit signed integer without slowing things down?

I did also notice along the way that if I run the original 32-bit unsigned integer hash first a few times, and then the faster 31-bit unsigned integer hash, then the 31-bit hash is instead two orders of magnitude slower.

Vyacheslav Egorov

unread,
May 30, 2012, 12:47:15 PM5/30/12
to v8-u...@googlegroups.com
>>> is a logical right shift. It shifts bits right filling vacant positions with 0 (as opposed to >> which fills vacant positions with sign bit).

In JavaScript >>> also performs ToUint32 on the input before doing
shift (as opposed to >> which converts input via ToInt32) so the
result is always a number from unsigned 32-bit integer range.

See: http://es5.github.com/#x11.7.3

x >>> 0 is basically ToUint32(x)

--
Vyacheslav Egorov

Joran Greef

unread,
May 30, 2012, 1:02:00 PM5/30/12
to v8-u...@googlegroups.com
To clarify, Array instead of Uint32Array is slightly slower as expected: 0.000054ms per key vs 0.000048ms per key.

Vyacheslav Egorov

unread,
May 30, 2012, 1:10:06 PM5/30/12
to v8-u...@googlegroups.com
Interesting. If unsigned integers are completely eliminated I would
expect it to be as fast as 31-bit version. Strange to still the slow
down still.

I would expect that conversion to bucket is independent of hash sign,
it should be the same: bucket_index = hash & mask; where mask =
number_of_buckets - 1 (and number_of_buckets is a power of 2).

--
Vyacheslav Egorov

On Wed, May 30, 2012 at 7:02 PM, Joran Greef <jo...@ronomon.com> wrote:
> To clarify, Array instead of Uint32Array is slightly slower as expected:
> 0.000054ms per key vs 0.000048ms per key.
>

Joran Greef

unread,
May 30, 2012, 1:10:45 PM5/30/12
to v8-u...@googlegroups.com
Keeping everything as Int32 but casting using "return hash >>> 0" is 0.00022ms per key.

Keeping everything as Int32 but casting using "return hash < 0 ? 4294967296 + hash : hash" is 0.000051ms per key, order of magnitude faster and still 32-bit.

Vyacheslav Egorov

unread,
May 30, 2012, 1:41:09 PM5/30/12
to v8-u...@googlegroups.com
So I threw in some code: https://gist.github.com/2837822 That does
something similiar to what you described (but not necessarily
completely similiar): it just does some int32 bitwise operations.

On my machine x64.release build of V8 takes 8ms (0.00008 ms per key).

If I start casting result to uint32 via >>> 0, I get to 36ms
(0.00036ms per key).

Limiting number of bits inside the table does not improve anything
(which is quite expected with code shaped like that).

--
Vyacheslav Egorov

Joran Greef

unread,
May 31, 2012, 5:28:11 AM5/31/12
to v8-u...@googlegroups.com
You got it just right.

Added more comparisons testing casting and unrolled the hash.


0.00047ms per key for tableU32, return hash >>> 0
0.00005ms per key for tableU31, return hash >>> 0
0.00004ms per key for tableS32, return hash
0.00025ms per key for tableS32, return hash >>> 0
0.00005ms per key for tableS32, return hash < 0 ? 4294967296 + hash : hash
Reply all
Reply to author
Forward
0 new messages