Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
hash function over IP address
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 37 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Mark  
View profile  
 More options Apr 4 2012, 4:37 pm
Newsgroups: comp.programming, comp.unix.programmer
From: "Mark" <mark_cruzNOTFORS...@hotmail.com>
Date: Wed, 4 Apr 2012 16:37:50 -0400
Local: Wed, Apr 4 2012 4:37 pm
Subject: hash function over IP address
Hello

I'm trying to find a way create a hash value over multiple IP addresses (I
keep a list of IP addresses and its number is variable. The simplest method
I've found so far is (obviously not the most effective and collision free):

unsigned long hash_ipaddr(struct in_addr *addr)
{
    unsigned long res;

    if (addr == NULL)
        return 0UL;

    res = (((addr->s_addr >> 24) & 0xff) * 256) + \
          (((addr->s_addr >> 16) & 0xff) * 256) + \
          (((addr->s_addr >>  8) & 0xff)* 256) + \
          (addr->s_addr & 0xff);

    return res;

}

And then sum up hashes for every IP and store it.

Can somebody suggest some better approach for my task?
Thanks.

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Margolin  
View profile  
 More options Apr 4 2012, 4:49 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Barry Margolin <bar...@alum.mit.edu>
Date: Wed, 04 Apr 2012 16:49:05 -0400
Local: Wed, Apr 4 2012 4:49 pm
Subject: Re: hash function over IP address
In article <jliber$oc...@speranza.aioe.org>,

How about s_addr % modulus, where modulus is a large (> 2^16) prime
number?

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark  
View profile  
 More options Apr 4 2012, 5:21 pm
Newsgroups: comp.programming, comp.unix.programmer
From: "Mark" <mark_cruzNOTFORS...@hotmail.com>
Date: Wed, 4 Apr 2012 17:21:20 -0400
Local: Wed, Apr 4 2012 5:21 pm
Subject: Re: hash function over IP address

"Barry Margolin" <bar...@alum.mit.edu> wrote in message

news:barmar-B6E941.16490504042012@news.eternal-september.org...
[skip]

>> And then sum up hashes for every IP and store it.

>> Can somebody suggest some better approach for my task?
>> Thanks.

> How about s_addr % modulus, where modulus is a large (> 2^16) prime
> number?

Thanks for suggestion. Is it common approach to sum up the produced hashes?

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Pfaff  
View profile  
 More options Apr 4 2012, 5:40 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Ben Pfaff <b...@cs.stanford.edu>
Date: Wed, 04 Apr 2012 14:40:32 -0700
Local: Wed, Apr 4 2012 5:40 pm
Subject: Re: hash function over IP address

"Mark" <mark_cruzNOTFORS...@hotmail.com> writes:
> I'm trying to find a way create a hash value over multiple IP addresses (I
> keep a list of IP addresses and its number is variable. The simplest method
> I've found so far is (obviously not the most effective and collision free):

Why not use one of many available high-quality hash functions?
Three that come to mind without even going to a search engine:
Jenkins lookup3, murmurhash, FNV.
--
Ben Pfaff
http://benpfaff.org

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Pitts  
View profile  
 More options Apr 4 2012, 5:47 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Daniel Pitts <newsgroup.nos...@virtualinfinity.net>
Date: Wed, 04 Apr 2012 14:47:30 -0700
Local: Wed, Apr 4 2012 5:47 pm
Subject: Re: hash function over IP address
On 4/4/12 2:21 PM, Mark wrote:

> "Barry Margolin"<bar...@alum.mit.edu>  wrote in message
> news:barmar-B6E941.16490504042012@news.eternal-september.org...
> [skip]
>>> And then sum up hashes for every IP and store it.

>>> Can somebody suggest some better approach for my task?
>>> Thanks.

>> How about s_addr % modulus, where modulus is a large (>  2^16) prime
>> number?

> Thanks for suggestion. Is it common approach to sum up the produced hashes?

If order matters, what I've seen is basically the following:

for each item in list:
    hash = hash * someSmallPrime + hashOf(item)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 4 2012, 5:52 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Wed, 04 Apr 2012 22:52:46 +0100
Local: Wed, Apr 4 2012 5:52 pm
Subject: Re: hash function over IP address

Ben Pfaff <b...@cs.stanford.edu> writes:
> "Mark" <mark_cruzNOTFORS...@hotmail.com> writes:

>> I'm trying to find a way create a hash value over multiple IP addresses (I
>> keep a list of IP addresses and its number is variable. The simplest method
>> I've found so far is (obviously not the most effective and collision free):

> Why not use one of many available high-quality hash functions?
> Three that come to mind without even going to a search engine:
> Jenkins lookup3, murmurhash, FNV.

The simple answer is these hash functions are designed to be used for
large, variable length string keys NOT for small, fixed-size
quantities which can be compared with a single machine instruction.

http://burtleburtle.net/bob/hash/integer.html
http://www.cris.com/~Ttwang/tech/inthash.htm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Pfaff  
View profile  
 More options Apr 4 2012, 6:35 pm
Newsgroups: comp.programming, comp.unix.programmer
From: b...@cs.stanford.edu (Ben Pfaff)
Date: Wed, 04 Apr 2012 15:35:47 -0700
Local: Wed, Apr 4 2012 6:35 pm
Subject: Re: hash function over IP address

He has a list of multiple IP addresses.  A hash function for
variable length keys may well be appropriate.
--
Peter Seebach on managing engineers:
"It's like herding cats, only most of the engineers are already
 sick of laser pointers."

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rick Jones  
View profile  
 More options Apr 4 2012, 7:18 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Rick Jones <rick.jon...@hp.com>
Date: Wed, 4 Apr 2012 23:18:27 +0000 (UTC)
Local: Wed, Apr 4 2012 7:18 pm
Subject: Re: hash function over IP address
In comp.unix.programmer Mark <mark_cruzNOTFORS...@hotmail.com> wrote:

> I'm trying to find a way create a hash value over multiple IP
> addresses (I keep a list of IP addresses and its number is
> variable. The simplest method I've found so far is (obviously not
> the most effective and collision free):
> unsigned long hash_ipaddr(struct in_addr *addr)

No IPv6?

rick jones
--
Process shall set you free from the need for rational thought.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Ahern  
View profile  
 More options Apr 4 2012, 7:57 pm
Newsgroups: comp.programming, comp.unix.programmer
From: William Ahern <will...@wilbur.25thandClement.com>
Date: Wed, 4 Apr 2012 16:57:37 -0700
Local: Wed, Apr 4 2012 7:57 pm
Subject: Re: hash function over IP address

Mark <mark_cruzNOTFORS...@hotmail.com> wrote:
> Hello
> I'm trying to find a way create a hash value over multiple IP addresses (I
> keep a list of IP addresses and its number is variable. The simplest
> method I've found so far is (obviously not the most effective and
> collision free):

http://cmph.sourceforge.net/

Also, because IP addresses are fixed size, perhaps the most practical
algorithm to use is binary search. Far easier to implement than perfect
hashing, with no storage overhead--a simple linear array (or two, for v4 and
v6) of addresses.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Sosman  
View profile  
 More options Apr 4 2012, 9:05 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Eric Sosman <esos...@ieee-dot-org.invalid>
Date: Wed, 04 Apr 2012 21:05:56 -0400
Local: Wed, Apr 4 2012 9:05 pm
Subject: Re: hash function over IP address
On 4/4/2012 4:37 PM, Mark wrote:

     (Idle question: Why the backslashes?)

>      return res;

     Briefly, this calculation takes four chunks of the address
as numbers between 0..255 and adds them to produce a sum 0..1020.
So: You start with 32 bits of information and squash them down to
a smidgen less than 10 bits -- with a moderate central tendencey,
too.  Ponder, young one: Wise this is?

> }

> And then sum up hashes for every IP and store it.

     Not sure what you mean by this.  A given IP produces one and
only one hash value with this scheme (or any other sane scheme),
so what do you mean by "sum up the hashes?"

> Can somebody suggest some better approach for my task?

     You haven't explained what your task is, so I'll have to guess.
If the objective is to use a hash table keyed on IP addresses, there
are two likely possibilities:

     - The number of buckets in the table is a prime number (or at
       the very least an odd number): Just take `*addr % N', and the
       outcome is very likely to be better, especially if N > 1020.
       The remainder after division by a prime is likely to break up
       simple patterns.

     - The number of buckets is a power of two. Now `*addr % N' would
       just throw away the high-order bits -- which might not be Bad,
       but could easily be Less Than Good if there are patterns in the
       input keys.  One possibility is to use `f(*addr) % N', where f()
       is a suitable "scrambling" function: For example, implement a
       two-line linear congruential pseudo-random number generator, seed
       it with `*addr', and take its output as f().

     For further ideas, see TAOCP volume III.

--
Eric Sosman
esos...@ieee-dot-org.invalid


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Margolin  
View profile  
 More options Apr 4 2012, 10:52 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Barry Margolin <bar...@alum.mit.edu>
Date: Wed, 04 Apr 2012 22:52:33 -0400
Local: Wed, Apr 4 2012 10:52 pm
Subject: Re: hash function over IP address
In article <jlir5m$dd...@dont-email.me>,
 Eric Sosman <esos...@ieee-dot-org.invalid> wrote:

That's not what I see.  It looks to me like he's adding the three
high-order bytes together, shifting that 8 bits to the left, and then
adding in the low-order byte.  So he's shrinking a 32-bit value down to
an 18-bit value.  Another way to write his expression is:

(((addr->s_addr >> 16) && 0xff00) +
 ((addr->s_addr >> 8) && 0xff00) +
 (addr->s_addr && 0xffff))

Hash functions are expected to squash a large value into a much smaller
one -- that's the whole point, isn't it?  Although it's hard understand
why one would bother when the hash is more than half the size of the
original.

> > }

> > And then sum up hashes for every IP and store it.

>      Not sure what you mean by this.  A given IP produces one and
> only one hash value with this scheme (or any other sane scheme),
> so what do you mean by "sum up the hashes?"

I think what he REALLY wants is a hash of a list of IPs.  So he's
hashing each IP, then combining all the hashes together by adding them.

A better solution might be to concatenate all the IPs, then use a
standard hash function on that.  If you don't want the result to depend
on the order of the IPs, sort the list before concatenating them.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
BGB  
View profile  
 More options Apr 4 2012, 11:21 pm
Newsgroups: comp.programming, comp.unix.programmer
From: BGB <cr88...@hotmail.com>
Date: Wed, 04 Apr 2012 20:21:25 -0700
Local: Wed, Apr 4 2012 11:21 pm
Subject: Re: hash function over IP address
On 4/4/2012 1:37 PM, Mark wrote:

maybe:
((addr->s_addr*65521)>>16);

then the hashed IP address is reduced to around 16 bits, and maybe these
can be added and then masked off to the desired number of bits?...

this maybe isn't very good either, oh well.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Udit Gangwani  
View profile  
 More options Apr 5 2012, 3:26 am
Newsgroups: comp.programming
From: Udit Gangwani <udit...@gmail.com>
Date: Thu, 5 Apr 2012 00:26:19 -0700 (PDT)
Local: Thurs, Apr 5 2012 3:26 am
Subject: Re: hash function over IP address

Hashing is preferable if you have large number of keys and you know the number of keys before hand. In that case hashing gives you the best results. Moreover, Hashing is more common data structure for the purpose of indexing non-integral keys.

> I'm trying to find a way create a hash value over multiple IP addresses (I
> keep a list of IP addresses and its number is variable. The simplest method
> I've found so far is (obviously not the most effective and collision free):

In your case the number of IPAddresses are variable and your keys are integral.
So you can use these points for your benefit to use a more better approach.

1st Approach :
   If the number of IPAddress will not exceed a very large value( < 100000), then you can use AVL Trees/Red Black Trees. If you are using C++, you can use std::map<> data structure directly for this purpose.
These will give you in worst case the order of O(15-17) for Insertion, Deletion and Searching. Plus it will save you a lot of time in creating your own hash data structure and making sure it is giving you best performance.

2nd Approach :
  If you know the range of you IP addresses, lets say between 192.168.0.0 to 192.168.10.256. Then you can directly allocate an array of size of the number of IP Addresses in this range. And you can perform the constant searching by subtracting the minimum IP Address from your queried IP Address to get the index into the array.

But if the above two approaches doesnot suit your needs then you can always use some libraries which provide hash datastructures already implemented in them.

For c++ check out this link :
http://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-ta...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 5 2012, 7:08 am
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Thu, 05 Apr 2012 12:08:58 +0100
Local: Thurs, Apr 5 2012 7:08 am
Subject: Re: hash function over IP address

Calculating the hash value must be cheaper than the full comparisons
which will be needed for a linear search. And performing extensive
byte-shuffling on machine words is not a good strategy to achieve
that.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 5 2012, 7:15 am
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Thu, 05 Apr 2012 12:15:40 +0100
Local: Thurs, Apr 5 2012 7:15 am
Subject: Re: hash function over IP address

"Mark" <mark_cruzNOTFORS...@hotmail.com> writes:
> unsigned long hash_ipaddr(struct in_addr *addr)
> {
>     unsigned long res;

>     if (addr == NULL)
>         return 0UL;

>     res = (((addr->s_addr >> 24) & 0xff) * 256) + \
>           (((addr->s_addr >> 16) & 0xff) * 256) + \
>           (((addr->s_addr >>  8) & 0xff)* 256) + \
>           (addr->s_addr & 0xff);

>     return res;
> }

Unrelated question (Barry Margolin also noticed that but I'd like to
try expressing it somewhat clearer): This function adds the three
highest octets together in the high octet of a 16-bit integer and
then adds the lowest octet of the original address to the lowest octet
of this 16-bit integer whose value was zero before this addition. This
seems rather odd. Is this intentional?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Margolin  
View profile  
 More options Apr 5 2012, 7:52 am
Newsgroups: comp.programming, comp.unix.programmer
From: Barry Margolin <bar...@alum.mit.edu>
Date: Thu, 05 Apr 2012 07:52:09 -0400
Local: Thurs, Apr 5 2012 7:52 am
Subject: Re: hash function over IP address
In article <87fwcio2eb....@sapphire.mobileactivedefense.com>,
 Rainer Weikusat <rweiku...@mssgmbh.com> wrote:

He apparently wants each address in a /24 to generate a different hash.  
This seems reasonable if he expects his application to have to deal with
many clients from the same subnet, and avoid clashes among them.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark  
View profile  
 More options Apr 5 2012, 8:42 am
Newsgroups: comp.programming, comp.unix.programmer
From: "Mark" <mark_cruzNOTFORS...@hotmail.com>
Date: Thu, 5 Apr 2012 08:42:00 -0400
Subject: Re: hash function over IP address
"Eric Sosman" <esos...@ieee-dot-org.invalid> wrote in message

news:jlir5m$dd8$1@dont-email.me...

[skip]

>> Can somebody suggest some better approach for my task?

>     You haven't explained what your task is, so I'll have to guess.
> If the objective is to use a hash table keyed on IP addresses, there
> are two likely possibilities:

First of all, thanks for comments. What I'm trying to solve is to find a way
to search in  AVL tree used in IP routing application. Initially a tree's
node looked simple:

struct nhlfe_entry
{
  u_int32_t nhlfe_ix;
  u_char opcode;
  ...
  struct nhlfe_key key;  /* nexthop key */

}

/* defines a search key. */
struct nhlfe_key
{
  struct in_addr nh_addr;
  u_int32_t oif_ix;
  u_int32_t out_label;

}

So the search was based on 'struct nhlfe_key'. Now, what I'm trying to add
support for multiple next hops, that is I add a linked list in nhlfe_entry:
struct nhlfe_entry{  u_int32_t nhlfe_ix;  u_char opcode;  ...  struct list
*nhkey_list;}where 'struct list' is struct listnode that embeds 'void *data'
pointer to caller'sprivate data, and this is 'struct nhlfe_key'.So I'm
trying to find the way to generate key based on multiple elements in the
list to store/search nodes in the tree. One of options I was considering
hash function, but it now looks unfeasible to me; the other is to always
have a key based on the very first element of the list (but whenever list
changes, re-insert node in the tree); and the third is to have two different
containers - one is AVL tree where key is IP address and the second is list
or array of 'nhlfe_entry', and each tree node should have some reference to
elements from the 2nd container.. So not sure which one to proceed with.

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Sosman  
View profile  
 More options Apr 5 2012, 8:46 am
Newsgroups: comp.programming, comp.unix.programmer
From: Eric Sosman <esos...@ieee-dot-org.invalid>
Date: Thu, 05 Apr 2012 08:46:17 -0400
Local: Thurs, Apr 5 2012 8:46 am
Subject: Re: hash function over IP address
On 4/4/2012 10:52 PM, Barry Margolin wrote:

     Nor I, now that my eyes are open.  Thanks for the catch.  (Sigh.)

> It looks to me like he's adding the three
> high-order bytes together, shifting that 8 bits to the left, and then
> adding in the low-order byte.  So he's shrinking a 32-bit value down to
> an 18-bit value.  Another way to write his expression is:

> (((addr->s_addr>>  16)&&  0xff00) +
>   ((addr->s_addr>>  8)&&  0xff00) +
>   (addr->s_addr&&  0xffff))

> Hash functions are expected to squash a large value into a much smaller
> one -- that's the whole point, isn't it?  Although it's hard understand
> why one would bother when the hash is more than half the size of the
> original.

     Yeah.  He's still got the central-tendency problem with three
of the bytes, too.  Even if he's got a 2**18-bucket table and will
use this hash directly as the bucket index, I doubt he'll get a
good distribution: I'd expect more-than-normal collisions in the
buckets near 97920, with more-than-normal empty buckets in the "tails."

>>> And then sum up hashes for every IP and store it.

>>       Not sure what you mean by this.  A given IP produces one and
>> only one hash value with this scheme (or any other sane scheme),
>> so what do you mean by "sum up the hashes?"

> I think what he REALLY wants is a hash of a list of IPs.  So he's
> hashing each IP, then combining all the hashes together by adding them.

> A better solution might be to concatenate all the IPs, then use a
> standard hash function on that.  If you don't want the result to depend
> on the order of the IPs, sort the list before concatenating them.

     Or hash each individually and combine them with a commutative
operation like + or ^.  That's not usually a great idea because it
gives (A,B) and (B,A) the same hash -- but if he *wants* them to
have the same hash, well, ...

--
Eric Sosman
esos...@ieee-dot-org.invalid


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 5 2012, 10:04 am
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Thu, 05 Apr 2012 15:04:12 +0100
Local: Thurs, Apr 5 2012 10:04 am
Subject: Re: hash function over IP address

The simplest solution would be to concatenate all 'next hops' in
order to form a key. A better idea would be to build a so-called radix
search trie based on keys like this. The best online description of
that I found quickly is:

        http://www.cs.hut.fi/Opinnot/T-106.253/k2002/english/extra/digital_se...

At least the ancient version of this book

        http://www.informit.com/store/product.aspx?isbn=0201314525

I happen to own contains a good description. In contrast to this, the
Wikipedia article completely useless, except as an excellent example
how language can be used to obscure phenomenons instead of explaining
them.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Pfaff  
View profile  
 More options Apr 5 2012, 10:42 am
Newsgroups: comp.programming, comp.unix.programmer
From: b...@cs.stanford.edu (Ben Pfaff)
Date: Thu, 05 Apr 2012 07:42:50 -0700
Local: Thurs, Apr 5 2012 10:42 am
Subject: Re: hash function over IP address

You suggested http://burtleburtle.net/bob/hash/integer.html.
Take a look at the first suggested hash function for an integer
there.  GCC 4.4 with -O3 compiles this into 12 instructions.  And
then you have to figure out a way to combine multiple hashes for
subsequent integers.

Now look at, say, http://burtleburtle.net/bob/c/lookup3.c.  It
takes 12*3 instructions (according to the comments) to hash three
32-bit integers, which, per-integer, is the same rate, and you
don't have to invent a way to combine them.
--
"While the Melissa license is a bit unclear, Melissa aggressively
 encourages free distribution of its source code."
--Kevin Dalley <ke...@seti.org>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark  
View profile  
 More options Apr 5 2012, 10:43 am
Newsgroups: comp.programming, comp.unix.programmer
From: "Mark" <mark_cruzNOTFORS...@hotmail.com>
Date: Thu, 5 Apr 2012 10:43:25 -0400
Local: Thurs, Apr 5 2012 10:43 am
Subject: Re: hash function over IP address

"Rainer Weikusat" <rweiku...@mssgmbh.com> wrote in message

news:87bon65l7n.fsf@sapphire.mobileactivedefense.com...

>> So the search was based on 'struct nhlfe_key'. Now, what I'm trying to
>> add
>> support for multiple next hops, that is I add a linked list in
>> nhlfe_entry:
>> struct nhlfe_entry{  u_int32_t nhlfe_ix;  u_char opcode;  ...  struct
>> list
>> *nhkey_list;}where 'struct list' is struct listnode that embeds 'void
>> *data'
>> pointer to caller'sprivate data, and this is 'struct nhlfe_key'.So I'm
>> trying to find the way to generate key based on multiple elements in the
>> list to store/search nodes in the tree.

> The simplest solution would be to concatenate all 'next hops' in
> order to form a key.

Number of next hops is variable, thus anyway it would require to delete the
node and re-insert it back with a new key generated, right? (that's not an
issue though). How would you suggest to concatenate 32-bit long unsigned
integers, in what form -- convert to strings, concatenate and use such
string as a key?

> A better idea would be to build a so-called radix
> search trie based on keys like this.

Unfortunately this isn't feasible at the moment, as it will require lots of
chnages all over the code and likely break the existing functionality.

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 5 2012, 11:29 am
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Thu, 05 Apr 2012 16:29:57 +0100
Local: Thurs, Apr 5 2012 11:29 am
Subject: Re: hash function over IP address

b...@cs.stanford.edu (Ben Pfaff) writes:
> Rainer Weikusat <rweiku...@mssgmbh.com> writes:
>> b...@cs.stanford.edu (Ben Pfaff) writes:

[...]

That's a web page with some background explanations by an 'authority' you
were (indirectly) refering to and some examples of 'integer hash
functions' supposed to produce hash values with qualities Bob Jenkins
considered to be desirable. I wouldn't want to use them to hash IP
addresses, much as I also wouldn't want to use one of the 'monster
hashes' for string lookup.

> Now look at, say, http://burtleburtle.net/bob/c/lookup3.c.  It
> takes 12*3 instructions (according to the comments) to hash three
> 32-bit integers, which, per-integer, is the same rate, and you
> don't have to invent a way to combine them.

According to the table in

        http://burtleburtle.net/bob/hash/doobs.html

hashing 3 32-bit integers _byte by byte_ with lookup three is assumed
to take 80 instructions.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Pitts  
View profile  
 More options Apr 5 2012, 1:05 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Daniel Pitts <newsgroup.nos...@virtualinfinity.net>
Date: Thu, 05 Apr 2012 10:05:21 -0700
Local: Thurs, Apr 5 2012 1:05 pm
Subject: Re: hash function over IP address
On 4/5/12 4:08 AM, Rainer Weikusat wrote:

Really? How do you hash a string then? The point of a hash function is
often (not always) used to determine a bucket to look into.  It needn't
be faster than a comparison, and actually is often slower.
Cryptographically strong hashes are significantly slower than that
usually.

Often performance of the program is not dominated by the speed of the
hash-function, but the number of collisions created by it.  Having a
hash-function that is 10 times slower than a comparison, but creates an
even spread will generally outperform the "fastest" hash.  If you want a
fast hash, "return 0;"

> And performing extensive
> byte-shuffling on machine words is not a good strategy to achieve
> that.

True, but what else are you going to do?

The original posters problem, as I understand it, is that he has several
IP addresses that need to be hashed together.  I would assume that means
he has a table that is a map from a collection of IP addresses to some
object.

First, an IPV4 address is basically 32 bits.  The entropy is low on the
high-order bits and higher on the low order, which is fine for typical
hash use-cases.

So, he has basically an integer bag (The OP hasn't specified whether the
order of IPs matter, nor whether they may have duplicates)

If order doesn't matter, then the easiest correct approach is simply to
sum up the integer value of each IP (not each octant, but the whole IP).

If order does matter, then you might want Σ(IP[i] * smallPrime^i).
(where ^ is the exponent operator, not xor).  This can be implemented
as: int hash=0; foreach(IP in theList) hash = hash * smallPrime + IP;

Duplicates will be handled correctly for both cases, but the hashes will
only be equal if the same duplicates exist.

Either way, the actual speed of the hash function only matters if it
because a bottleneck to the performance of the program.  The requires
real-world measurement and profiling, and not a bunch of supposition
from wiseguys on a newsgroup :-).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rainer Weikusat  
View profile  
 More options Apr 5 2012, 1:12 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Rainer Weikusat <rweiku...@mssgmbh.com>
Date: Thu, 05 Apr 2012 18:12:16 +0100
Local: Thurs, Apr 5 2012 1:12 pm
Subject: Re: hash function over IP address

Really. If calculating the hash value is more expensive than the
comparisons it is supposed to avoid, it obviously buys you nothing.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Pitts  
View profile  
 More options Apr 5 2012, 1:23 pm
Newsgroups: comp.programming, comp.unix.programmer
From: Daniel Pitts <newsgroup.nos...@virtualinfinity.net>
Date: Thu, 05 Apr 2012 10:23:04 -0700
Local: Thurs, Apr 5 2012 1:23 pm
Subject: Re: hash function over IP address
On 4/5/12 10:12 AM, Rainer Weikusat wrote:

A single hash is supposed to save you multiple comparisons, not a single
comparison, so it can be many times more expensive than a single
comparison and still improve performance.  A hash that is 10 times
slower than a comparison, but allows you to find a bucket with only 3
out of 3000 entries saves you an average cost of 1487 comparisons.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 37   Newer >
« Back to Discussions « Newer topic     Older topic »