Fibonacci 9

0 views
Skip to first unread message

Saustin Grody

unread,
Aug 3, 2024, 5:54:21 PM8/3/24
to netarvintwax

But it is slow. If I do map fibonacci [1..] it really slows down as the numbers come. I'm guessing this is overhead due to how much stack is being used and the sheer number of calculations - doing each one down to a and b rather than just adding the last two together.

As you can see fibonacci 3 is called twice, fibonacci 2 is called three times and fibonacci 1 is called twice (in this very small example). There is a huge amount of overlap: you call the same function with the same arguments multiple times. This is of course inefficient: once you know that fibonacci 3 is 3, there is no need to calculate it a second time. Of course in this very small example that's not really an issue: calculating fibonacci 3 is a matter of nanoseconds. But in case you have to calculate fibonacci 100 multiple times, this will have a dramatic impact. The number of redundant calls also scales exponentially (so this is not some small problem that only has some impact in the margin).

What you can do is work with accumulator(s): variables you pass recursively and update accordingly. For fibonacci, there are two such variables, the f(n-2) and f(n-1). You then each time calculate the sum of these two and shift so:

This thus results into five calls (against nine calls for the original code). Of course the number of calls is not a guarantee on performance, since some methods do more work, but our approach scales linearly with n whereas the original approach scales exponentially with n. So even if the original method is thousands of times faster, eventually the difference in the number of calls will be that huge, that it will be outperformed.

This gives a pretty even distribution: The number 0 comes up three times, all other numbers come up twice. And every number is far removed from the previous and the next number. If we increase the input by one, the output jumps around quite a bit. So this is starting to look like a good hash function. And also a good way to map a number from a larger range into the range from 0 to 7.

In fact we already have the whole algorithm right here. All we have to do to get an arbitrary power of two range is to change the shift amount. So if my hash table is size 1024, then instead of just looking at the top 3 bits I want to look at the top 10 bits. So I shift by 54 instead of 61. Easy enough.

Multiplicative hashing is a simple type of hash function often used by teachers introducing students to hash tables. Multiplicative hash functions are simple and fast, but have higher collision rates in hash tables than more sophisticated hash functions.

To measure that I wrote a small program that takes a hash , and runs it through Fibonacci hashing to get a slot in the hash table . Then I change a single bit in , giving me , and after I run that through Fibonacci hashing I get a slot . Then I measure depending on which bit I changed in , which bits are likely to change in compared to and which bits are unlikely to change.

I then run that same test every time after I doubled a hash table, because with different size hash tables there are more bits in the output: If the hash table only has four slots, there are only two bits in the output. If the hash table has 1024 slots, there are ten bits in the output. Finally I color code the result and plot the whole thing as a picture that looks like this:

Let me explain this picture. Each row of pixels represents one of the 64 bits of the input hash. The bottom-most row is the first bit, the topmost row is the 64th bit. Each column represents one bit in the output. The first two columns are the output bits for a table of size 4, the next three columns are the output bits for a table of size 8 etc. until the last 23 bits are for a table of size eight million. The color coding means this:

The worst pattern we can see is at the top of the picture: The last bit of the input hash (the top row in the picture) can always only affect the last bit of the output slot in the table. (the last column of each section) So if the table has 1024 slots, the last bit of the input hash can only determine the bit in the output slot for the number 512. It will never change any other bit in the output. Lower bits in the input can affect more bits in the output, so there is more mixing going on for those.

This one has more randomness at the top, but a clearer pattern at the bottom. All that red means that the first few bits in the input hash can only determine the first few bits in the output hash. Which makes sense for integer modulo. A small number modulo a large number will never result in a large number, so a change to a small number can never affect the later bits.

This one is obviously bad: The upper bits of the hash value have completely red rows, because they will simply get chopped off. Only the lower bits of the input have any effect, and they can only affect their own bits in the output. This picture right here shows why using a power of two size requires that you are careful with your choice of hash function for the hash table: If those red rows represent important bits, you will simply lose them.

Finally here is the link to my implementation of unordered_map. My other two hash tables are also there: flat_hash_map has very fast lookups and bytell_hash_map is also very fast but was designed more to save memory compared to flat_hash_map.

Why does Fibonacci hashing not work as a hash function alone? It should, unless you have a use case that results in lots of Fibonacci numbers being inserted. You can use the identity function as the hash when you use Fibonacci hashing to assign a hash to an index. In fact you can also use the identity function when using prime number sizes. Only for power of two sizes and for fastrange do you need a separate hash function. Or when your type is bigger than 64 bits obviously.

I think it is possible to improve fastrange so that it also uses the low bits, but why not just use Fibonacci hashing? To improve fastrange you would have to slow it down. Fibonacci hashing is already the same speed and gives better results.

Then we have two cases:
1. The type is smaller than 64 bits. This includes ints, floats, small custom structs or small std::pairs.
2. The type is variable size or larger than 64 bits. This includes std::string and other sequences.

The brotli code actually has some of the properties they were looking for when searching out this number as well as referencing some heuristic of which they ran tests in order to optimize this for. Interesting you can see this number popping up in a number of open sourced google tools that need extremely fast hashing (snappy, brotli, webp).

The previous one was a) close to the golden ratio, b) prime, and c) had each sub-byte within some special set of ranges as described by Knuth. IIRC all of these actually helped a tiny bit on the benchmarks when we changed the hash (it was originally something that was home-grown and both quite bad and slow), but the latter was so microscopic that it could just have been luck.

My guesstimate is that it will no longer match natural occurring patters, while still showing some sort of unique spread. That or the emergence of a disastrous interfering pattern, but I think that is a bit less likely with there kind of ratios.

I was playing around with different constants before posting the blog post, but I only ever found bad constants, (and found lots of new ways that constants can be bad) so I left that part out of the text.

Well, but there is no such thing as step_size=1024*1024 in my codes because I always try to keep the randomness in the lower bits or even better to keep the randomness uniformly distributed over all bits of the 64-bit integer.

Since Fibonacci multiples have essentially random values for the lower bits, I wonder if you can break up any remaining ill conditioned patterns by XOR-ing the result of the multiplication with the input value itself. Intuitively, it should combine the strong hashing power of the multiplication with the excellent spreading you would otherwise see for a Fibonacci pattern mapped 1:1 in a power of 2 sized table.

Only if you disregard rounding, I believe. The beauty of Fibonacci hashing is that with incrementing inputs, the output will always fall in the largest open interval (ie. as far as possible from any earlier element).

Yes, any irrational number will wrap around without getting back to the starting point. And I actually spent a good amount of time searching for a replacement number that has fewer problematic numbers.

You golden-ratio-based multiplier, 0x9E3779B97F4A7C15, is pretty close. I see one F and one sequence of 7 1-bits, popcount is 38. So you missed my criteria, but not by much. Might say more about my criteria being overly pessimistic than about your constant being bad.

You can accelerate division by computing an analog to the inverse of a number with the Newton-Raphson method. To compute the division you multiply the inverse and the number you want to divide and then take the high bits. The result is potentially 1 less than the division you wanted, so you check for that and you are done. If you store the inverse and if the division is going to be repeated more than three times it should already be worth it according to my benchmarks. Computing and applying the inverse takes around 2.7 times more than the hardware division, but the hardware division takes 13.3 times longer than just applying the inverse, that is when the divisor is momentarily constant.

Still I managed to find 5 primes that give nice form of optimized code(all 5 primes have same sequence of asm instructions with 2 diff constants) that could be used to implement as a simple function with 2 arguments(beside input hash).

Modular arithmetic makes a difference. If you multiply 0x9e3779b97f4a7c15 by 0xf1de83e19937733d, you get 0x957bbf35006ed6770000000000000001, a large 128bit number. If you only look at the low 64 bits, you get 1. If you only consider the remainder after dividing by 2^64 (as computers do), the product is 1 and each of the two factors is the inverse of the other.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages