Good random hash function?

2,586 views
Skip to first unread message

Rune Skovbo Johansen

unread,
Sep 2, 2012, 8:23:24 AM9/2/12
to procedur...@googlegroups.com
For large worlds where only parts are generated at a time, a typical need is to get a random number associlated with an input vector (such as a location in the world), and this random number should always be the same given the same input. This is referred to as a hash function - not to be confused with random number generators, where each random number is dependent on the previous one.

But procedural generation is not the typical use of hash functions, and not all hash functions are well suited for procedural generation, as they may either not have sufficiently random distribution, or be unnecessarily expensive.

What I need is a hash function that takes 3 or 4 integers as input and outputs a random number (for example either a float between 0 and 1 or an integer between zero and Int32.MaxValue). Do anyone have suggestions for a good hash function for this purpose? The question has been asked before, but I haven't yet seen any satisfactory answers.

So far I've been looking at Long period hash functions for procedural texturing (2006) but this algorithm returns integers in a very small range (e.g. a number between 0 and 16; other ranges are possible but they're all small) which is not sufficient for my needs.

Some people suggest using MD5 or other cryptographic hash functions. While they sound like they fulfill the strict requirements, they're likely much more complex than needed. For example, I just need a 32-bit int as return value, while cryptographic hash functions often return much larger hash values. I'd be throwing out most of it, which seems like a waste of calculations.

For continuous noise I'm using simplex noise. I've considered using this also when the input is integers, but internally is uses floating point numbers, which have worse stability the further you get from the origin. I think it can be rewritten to use integers for coordinates, but I'm still guessing the algorithm is overkill for randomness that doesn't need to be continuous at all.

So basically, good randomness properties, no reliance on floating point coordinates, and relatively cheap. Any suggestions?

Rune

Lucas, Simon M

unread,
Sep 3, 2012, 3:45:43 AM9/3/12
to procedur...@googlegroups.com

Hi Rune,

 

From what you say below it sounds like a random number generator

would work fine, where you use the 3 or 4 integers as the seed,

and then call random.nextInt() to get your random hash.  If the inputs

have too many bits for a single seed, then use more than one RNG

and combine their outputs in some way (e.g. add them), and then take

it modulo the required range.

 

Hope this helps.

 

   Simon Lucas

Peter Mawhorter

unread,
Sep 3, 2012, 12:59:25 PM9/3/12
to procedur...@googlegroups.com
My instinct would be to use multidimensional simplex noise. If you're
really concerned about floating point errors, you could set up a
scheme where you sampled from multiple simplex noise functions with
origins spread out in a grid and used a linear combination of the (say
4) nearest noise functions to get your number. That way you'd never be
too far from the origin of the simplex noise, since the origin that
you were using would always be within whatever tolerance you wanted.
Of course, if the continuous nature of the noise isn't important to
you, you can do away with the linear combination idea and just always
pick the origin of your simplex noise as the closest point on some
(incredibly large, of course) grid.

On the other hand, it might be worth looking into what games like
Noctis and Minecraft use (although whether that information is
available I don't know).

-Peter Mawhorter

Rune Skovbo Johansen

unread,
Sep 3, 2012, 2:26:42 PM9/3/12
to procedur...@googlegroups.com
On Monday, September 3, 2012 9:45:56 AM UTC+2, Lucas, Simon M wrote:

From what you say below it sounds like a random number generator

would work fine, where you use the 3 or 4 integers as the seed,

and then call random.nextInt() to get your random hash.

That's a common misconception. Random number generators are made to generate good randomness of consecutive numbers in the same seed sequence, not "parallel" numbers in different seed sequences. I made a quick test (in C#, using System.Random) to confirm and illustrate it - see this image:

http://runevision.com/blog/images/20120903_random_seeds.png

This is the C# code I used (with warmup values of 0, 10, and 1000):

        Texture2D tex = new Texture2D (256, 256);
        int seed = 0;
        for (int i=0; i<tex.width; i++) {
            for (int j=0; j<tex.height; j++) {
                System.Random rand = new System.Random (seed);
                seed++;
                for (int n=0; n<warmup; n++)
                    rand.NextDouble ();
                float f = (float)rand.NextDouble ();
                tex.SetPixel (i, j, new Color (f,f,f,1));
            }
        }

So I'm afraid it's not as simple as that. :)

Rune

Craig Reynolds

unread,
Sep 3, 2012, 2:37:04 PM9/3/12
to procedur...@googlegroups.com
If I recall correctly, this 2009 SIGGRAPH paper on texture synthesis uses an approach like Simon Lucas suggests: generating random points within grid cells, where the per-cell random sequence is seeded by the grid cell indices.

Procedural noise using sparse Gabor convolution
http://graphics.cs.kuleuven.be/publications/LLDD09PNSGC/

Lucas, Simon M

unread,
Sep 3, 2012, 5:36:15 PM9/3/12
to procedur...@googlegroups.com

Hi Rune,

 

I think it might be as simple as that.

 

e.g. – I’ve written this Java function to do a similar

thing to your texture example:

 

Where func(i,j) returns the intensity of pixel i,j

 

    public int func(int i, int j) {

        Random r = new Random(i * j);

        int x = r.nextInt() % 256;

        return Math.abs(x);

    }

 

Image below (not sure whether this will be posted)

 

 

It looks fine, I think.

 Note: the details of the RNG are important,

 and it’s usually much safer to take the low order

bits (which I did by doing % 256) rather than the high order

ones which you did implicitly by calling nextDouble

 

Cheers,

 

   Simon

Rune Skovbo Johansen

unread,
Sep 4, 2012, 3:58:37 AM9/4/12
to procedur...@googlegroups.com
Hi Simon,

That's interesting. I tried to first replicate your code exactly, but I got a distinct pattern. I looked into the implementations of Random in Java versus C# and noted that C# takes the given seed as it, while Java modifies the given seed with this formula:

this.seed = (seed ^ 0x5DEECE66DL) & ((1L << 48) - 1);

Once I added the same modification to the seed in my C# code I got results comparable to yours.

About using the low order bits; I found it to not make a difference. My original code gave the same kind of patterns when I used .Next() % 256 (.Next() returns a positive int in C#). And with the seed modification, it gave the same apparent randomness regardless of whether I used Next() % 256 or NextDouble().

So it seems like the randomness here is solely down to the seed modification done with the above formula, and although it looks random in the texture you posted, I'm not sure it's really all that random.

If you try to look at the values generated on the line from coord (1, 0) to coord (1, 65536) you'll see distinct patterns. (I created a 256x256 texture with the random values in that line to verify by changing i*j to i+256*j.)

On a side note, basing the seed on i*j creates a diagonal mirror axis and also creates the same seed for any coord where either i or j is 0.

Cheers,

Rune

Lucas, Simon M

unread,
Sep 4, 2012, 4:27:49 AM9/4/12
to procedur...@googlegroups.com

Hi Rune,

 

Using the Java RNG, taking the higher-order bits (via calls to nextDouble) makes a big difference:

 

 

(produced with:

 

   public int func(int i, int j) {

        Random r = new Random(i * j);

        int x = (int) (r.nextDouble() * 256);   // note: r.nextInt(256) produces the same banded pattern

        return Math.abs(x);  // Math.abs no longer necessary in this version

    }

 

Regarding the mirror image in line i=j, yes, that’s right, but for a given task that may or may not be a problem.

And on closer inspection, there are identifiable patterns in the “random texture” I posted yesterday.

 

Do you have an objective measure of

how fit for *your* purpose one of these hashing functions is? 

 

Cheers,

 

   Simon

 

 

From: procedur...@googlegroups.com [mailto:procedur...@googlegroups.com] On Behalf Of Rune Skovbo Johansen
Sent: 04 September 2012 08:59
To: procedur...@googlegroups.com
Subject: Re: [pcg] Good random hash function?

 

Hi Simon,



That's interesting. I tried to first replicate your code exactly, but I got a distinct pattern. I looked into the implementations of Random in Java versus C# and noted that C# takes the given seed as it, while Java modifies the given seed with this formula:

this.seed = (seed ^ 0x5DEECE66DL) & ((1L << 48) - 1);


Once I added the same modification to the seed in my C# code I got results comparable to yours.

About using the low order bits; I found it to not make a difference. My original code gave the same kind of patterns when I used .Next() % 256 (.Next() returns a positive int in C#). And with the seed modification, it gave the same apparent randomness regardless of whether I used Next() % 256 or NextDouble().

So it seems like the randomness here is solely down to the seed modification done with the above formula, and although it looks random in the texture you posted, I'm not sure it's really all that random.

If you try to look at the values generated on the line from coord (1, 0) to coord (1, 65536) you'll see distinct patterns. (I created a 256x256 texture with the random values in that line to verify by changing i*j to i+256*j.)

On a side note, basing the seed on i*j creates a diagonal mirror axis and also creates the same seed for any coord where either i or j is 0.

Cheers,

Rune

On Monday, September 3, 2012 11:37:06 PM UTC+2, Lucas, Simon M wrote:

Hi Rune,

 

I think it might be as simple as that.

 

e.g. – I’ve written this Java function to do a similar

thing to your texture example:

 

Where func(i,j) returns the intensity of pixel i,j

 

    public int func(int i, int j) {

        Random r = new Random(i * j);

        int x = r.nextInt() % 256;

        return Math.abs(x);

    }

 

Image below (not sure whether this will be posted)

 

 

It looks fine, I think.

Rune Skovbo Johansen

unread,
Sep 4, 2012, 5:15:19 AM9/4/12
to procedur...@googlegroups.com
On Tuesday, September 4, 2012 10:28:00 AM UTC+2, Lucas, Simon M wrote:
Regarding the mirror image in line i=j, yes, that’s right, but for a given task that may or may not be a problem.

And on closer inspection, there are identifiable patterns in the “random texture” I posted yesterday.

Do you have an objective measure of how fit for *your* purpose one of these hashing functions is?

I'm going to use the hash function in code that may apply all kinds of transforms to both the input coordinates and the output values, so it shouldn't be possible to apply some such transform and get a result with recognizable patterns. Basically something like K2 randomness as described on this page. Consequtive numbers in the same random stream of a good RNG fulfill such demands and much more but they are not hash functions. Cryptographic level hash functions also fulfill these demands, but fulfill a lot more cryptography related demands as well (K3 or K4 level) that make them computationally expensive.

Rune

Rune Skovbo Johansen

unread,
Sep 4, 2012, 2:57:05 PM9/4/12
to procedur...@googlegroups.com
On Monday, September 3, 2012 6:59:57 PM UTC+2, Peter wrote:
My instinct would be to use multidimensional simplex noise.

Ok, I just tried using a 3-dimensional simplex noise. Unfortunately I found out that there's a recognizable pattern here as well. Every coordinate (i, j, k) where i, j, and k are all multiples of 3 return 0.
 
On the other hand, it might be worth looking into what games like
Noctis and Minecraft use (although whether that information is
available I don't know).

According to this Minecraft uses 64 bit doubles for the terrain noise, meaning that terrain far from the origin may "start generating weird structures, such as huge blocks of solid material".

Rune

Adam Smith

unread,
Sep 4, 2012, 7:52:40 AM9/4/12
to procedur...@googlegroups.com
Hi Rune,

I get what you are asking for here, and a hash is what you are after.
Of functions you'll find in libraries, it seems like good old MD5 is
the closest you will get. However, you are also right to notice that
MD5 does a lot more work than necessary to produce it's
thoroughly-mixed 128 bit output.

Instead of just running the full MD5 and keeping the first 32 bits,
you could roll your own algorithm in the style of the original
algorithm. The general idea is to make each of the output bits
causally depend on as many of the input bits as you can without too
much effort (and to spread small ranges in the input over large ranges
in the output).

Call your input integers i, j and your hashed output h. Trying h = i ^
j certainly makes h depend on every bit of the input, but each bit of
h only touches one bit out of each input integer (and the same one for
all values). Just making up weird mutually dependent boolean
expression with and/or/shifts can lead to some really pretty pictures
(and I've seen this used for music generation), but it doesn't give
you the hash-like results you are looking for.

Here are a few bits of intuition I've got:
* Bitwise ands tend to reduce the number of 1 bits wile bitwise ors
tend to increase the number of 1 bits. Using either of these in an
unbalanced way leaks entropy.
* Xors are a nice way to mix the bits from two vectors without leaking
entropy, but you need to do something more than just xor to get bits
in different places into contact.
* Rotates (and the shifts you use to make them) can general shift any
noise you've created from one octave to another, but they won't create
noise on their own.
* Sums and differences (which aren't used as much in crypto because
they cost more in hardware and analysis) are kind of combo
and-shift-xor units and (with the help of long-distance carries) can
mix bits up pretty well in one operation.

Below is a little recipe I cooked up just now that can be unrolled to
use only 37 single-cycle operations.

int rot(int x, int b) {
return (x << b) ^ (x >> (32-b)); // broken rotate because I'm in java and lazy
}

int pcgHash(int i, int j) {
int a = i;
int b = j;
for (int r = 0; r < 3; r++) {
a = rot((a^0xcafebabe) + (b^0xfaceb00c), 23);
b = rot((a^0xdeadbeef) + (b^0x8badf00d), 5);
}
return a^b;
}

Here's a picture of the result tabulated as 24-bit colors for i and j
form 0 to 1023 each: http://imgur.com/W9Gde Perceptually, I'd say
that's not all that different from plain uniform RGB noise.

This algorithm seems fairly flexible with respect to choice of the big
hex constants (hence the choice of hexspeak words). The choice of
shift amounts for the broken rotate function were chosen, amongst a
few alternatives I tried, to give the best noisy look. The number of
cycles through the loop was chosen to be as small as possible without
allowing obvious patterns to form. Everything above 3 looks the same
to me, and just 2 cycles lead to some ugly bands. It felt sad to just
use either b as the final result (it looks decent), so a is mainly
xor'ed onto the result out of pity.

Maybe with better choice of constants and skipping a few operations
here and there you could reduce the 37 operations required to make a
pretty (noisy) picture here to something like half of that.

Good luck exploring!

Adam

On Tue, Sep 4, 2012 at 1:27 AM, Lucas, Simon M <s...@essex.ac.uk> wrote:
>
> Hi Rune,
>
>
>
> Using the Java RNG, taking the higher-order bits (via calls to nextDouble)
> makes a big difference:
>
>
>
>
>

Rune Skovbo Johansen

unread,
Sep 7, 2012, 5:01:44 AM9/7/12
to procedur...@googlegroups.com
Thanks Adam! That sounds promising.

I'm currently looking into running the ENT randomness test on the output from various algorithms. Once I get that working I'll test it with your hash function, murmur hash (as someone else suggested) and other alternatives and compare the results. I'll post my findings here once I have them.

Rune

Lucas, Simon M

unread,
Sep 7, 2012, 6:07:36 AM9/7/12
to procedur...@googlegroups.com

Hi Rune, Adam,

 

So far we’ve had some fun looking at the textures

generated by various methods,

but I think it would be also be interesting to experiment

with quantitative measures of hash quality.

 

Two components to it:

·         an even spread of output values over the set of inputs,

·         some measure of non-locality (at least, if the measure

is to correlate with the subjective ratings we’re been using

so far).

 

Adam, your colour image looks pretty random, but I thought I could

 see at least one smiley faces in it!  Also, by using 24 bit output, you’re reducing

the probability of collisions (in fact, going from a 20 bit input to

a 24 bit output it would be possible to have no collisions; have you looked at

the 8 bit grey-scale version?).

Rune Skovbo Johansen

unread,
Sep 9, 2012, 11:57:46 AM9/9/12
to procedur...@googlegroups.com
After a lot of testing I've found what looks like a good hash function. I used the ENT Pseudorandom Number Sequence Test Program to test the quality of randomness of various hash functions:
  • System.Random - Calling C# System.Random sequentially. Not a hash function but included for comparison.
  • MurmurHash3 - A hash function my colleague Aras Pranckevičius suggested I look at.
  • PcgHash - Hash function proposed in this thread by Adam Smith.
  • SeedRand - Calling C# System.Random using the input as the seed value.
  • SeedRand Modified - Like SeedRand but altering the seed the same way Java does it.
  • Linear - Just a linear function (input * 29 / 13) for comparison.
I dropped the requirement of the functions taking multiple ints as input, since it can be easily worked around. For easy comparison, all the functions take just one integer as input. (For the PcgHash function the second coord is always 0).

Test results

The test results can be seen here:
http://runevision.com/blog/images/20120909_random_hash_comparison.png

The top images shows the first 65536 random values listed in 256 rows of 256 pixels each.

The different test values are explained on the ENT page. I found the Monte Carlo Value for Pi value particularly interesting since it corresponds well with an intuitive judgement of randomness, but with much better precision than the naked eye.

Reading how the Monte Carlo Value for Pi is calculated inspired me to do the bottom images. They are generated by taking every 6 random values and creating an x,y coordinate from them using the first 3 values for x and the remaining 3 for y. This coordinate is plotted into the image. The resulting images show patterns in the randomness much clearer than when just plotting the random values one by one as in the top images.

Conclusions

For all purposes relevant to procedural generation, the System.Random in C# generate very good random values. It's not a hash function, but a hash function with equally good randomness is considered adequate provided it is not too expensive.

However, calling System.Random with different seeds is confirmed to result in very bad randomness. On top of that, it's very expensive too, presumably because creating each new random stream has a big overhead.

The PcgHash proposed by Adam Smith looks good to the naked eye when its values are plotted directly into an image. It's also very cheap. However, the ENT test shows that the randomness is not quite at the same quality as System.Random, and the image with plotted coordinates confirms that there are clear patterns in the randomness.

Finally, the MurmurHash3 seems to be as good as System.Random in every respect, and plotting its coordinates doesn't reveal any patterns. It's also pretty cheap. Note that the original MurmurHash3 takes a byte array as input and internally convert those to integers. I made a new version that takes an integer array directly, skipping the conversions back and forth. This improved the execution time about 3-fold while not affecting the output.

I also tested a SimplexNoise derived hash function, but it was so bad I didn't find it worth including it in the comparison, especially considering it wasn't meant as a hash function in the first place.

There is no doubt other hash functions out there which could have been tested too, but without having any suggestions to go by I didn't know where to start. I'm very satisfied with the results of MurmurHash3 in any case.

If anyone want the C# source code for the altered MurmurHash3 or the testing code, let me know. :)

Rune

Rune Skovbo Johansen

unread,
Jan 2, 2015, 7:18:32 AM1/2/15
to procedur...@googlegroups.com
Sorry for reopening a 2 years old thread, but I just wanted to let you know that I have now written a blog post about this topic:

Primer on Repeatable Random Numbers

Since I've heard a lot of misconceptions about pseudo random numbers over the years (misconceptions that I once had myself too), I think it's worth reading for anyone using random numbers in procedural generation. :)

I also created a BitBucket repository with my modified MurmurHash3 implementation as well as the testing framework I used to create the test images comparing the various hash functions and RNGs.

Rune

Pier Luca Lanzi

unread,
Jan 2, 2015, 7:47:08 AM1/2/15
to procedur...@googlegroups.com
Great article very useful.
PL

--

---
You received this message because you are subscribed to the Google Groups "Procedural Content Generation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proceduralcont...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages