Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Trying to convert small C++ subroutine (by Peter Weinberger) to Perl
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
David Filmer  
View profile  
 More options Feb 24 2009, 1:36 am
Newsgroups: comp.lang.perl.misc
From: David Filmer <use...@DavidFilmer.com>
Date: Mon, 23 Feb 2009 22:36:22 -0800 (PST)
Local: Tues, Feb 24 2009 1:36 am
Subject: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl
Greetings.

In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
(1989, Prentice Hall, page 14), I find a short hashing subroutine
("hash" as in cryptographic hash, not an associative array) by the
famous Peter Weinberger.

The subroutine accepts a string, and returns a numerical value. One
possible application of such a subroutine would be for storing (and
retrieving) of a vast number of files (many thousands or millions) -
the files could be stored in an arbitrary number of directories/
filesystems (ie, "hash buckets") and retrieved based on the hash value
of their filename.

I wish to convert this C++ subroutine to a Perl subroutine.  But my
knowledge of C++ is limited to what I learned in a class in college -
I have never coded in C++ and I am severely deficient in this
language.

Based on my limited C++ skills, I have made an effort to do this
conversion, but it does not produce the expected results.  I was
hoping someone here who was more familiar with C++ could point out the
error of my ways.

Here is subroutine from the book, based on Peter Weinberger's code:

int
hashpjw( char *s ) {
   const prime = 211;
   unsigned hash = 0, g;

   for( char *p = s; *p, p++ ) {
      hash = ( hash << 4 ) + *p;
      // assumes 32 bit int size
      if( g = hash & 0xf0000000 ) {
         hash ^= g >> 24;
         hsh ^= g;
      }
   }
   return hash % prime;

}

Here is my <lame>attempt</lame> to convert it to Perl:

#!/usr/bin/perl
use strict;

print hashpjw( 'bar', 123 );

sub hashpjw {
   my( $char, $s ) = @_;
   my $prime = 211;
   my ( $hash, $g, $p );
   for my $char( $p = $s, $p, $p++ ) {
      $hash = ( $hash << 4 ) + $p;
      if( $g = $hash & 0xf0000000 ) {
         $hash = $hash ^ ($g >> 24);
         $hash = $hash ^ $g;
      }
   }
   return $hash % $prime;

}

__END__

Any assistance is greatly appreciated!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Thoms Toerring  
View profile  
 More options Feb 24 2009, 4:11 am
Newsgroups: comp.lang.perl.misc
From: j...@toerring.de (Jens Thoms Toerring)
Date: 24 Feb 2009 09:11:20 GMT
Local: Tues, Feb 24 2009 4:11 am
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl

David Filmer <use...@davidfilmer.com> wrote:
> In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
> (1989, Prentice Hall, page 14), I find a short hashing subroutine
> ("hash" as in cryptographic hash, not an associative array) by the
> famous Peter Weinberger.
> The subroutine accepts a string, and returns a numerical value. One
> possible application of such a subroutine would be for storing (and
> retrieving) of a vast number of files (many thousands or millions) -
> the files could be stored in an arbitrary number of directories/
> filesystems (ie, "hash buckets") and retrieved based on the hash value
> of their filename.

Mmmm, but you also have to take into account possible "hash
collisions"...

Your 'prime' is awfully small, with that you never get more
than 211 different hash values...

> Here is my <lame>attempt</lame> to convert it to Perl:
> #!/usr/bin/perl
> use strict;

use warnings;

;-)

> print hashpjw( 'bar', 123 );

Why do you pass it two arguments if the C/C++ version only
takes a single one, the string of which you want to compute
the hash value?

> sub hashpjw {
>    my( $char, $s ) = @_;
>    my $prime = 211;
>    my ( $hash, $g, $p );
>    for my $char( $p = $s, $p, $p++ ) {
>       $hash = ( $hash << 4 ) + $p;
>       if( $g = $hash & 0xf0000000 ) {
>          $hash = $hash ^ ($g >> 24);
>          $hash = $hash ^ $g;
>       }
>    }
>    return $hash % $prime;
> }

Try this instead (and pass it just the string)

sub hashpjw {
    my ( $s, $prime, $hash ) = ( shift, 211, 0 );

    for ( split //, $s ) {
        $hash = ( $hash << 4 ) + ord $_;
        if ( my $g = $hash & 0xf0000000 ) {
            $hash ^= $g >> 24;
            $hash ^= $g;
        }
    }

    return $hash % $prime;

}

One warning: the C/C++ version is rather likely meant to be used
with 8-bit characters. The Perl version as it is will happily
accept e.g. Unicode characters and then the result will differ
from the C/C++ version!
                           Regards, Jens
--
  \   Jens Thoms Toerring  ___      j...@toerring.de
   \__________________________      http://toerring.de

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
smallpond  
View profile  
 More options Feb 24 2009, 12:18 pm
Newsgroups: comp.lang.perl.misc
From: smallpond <smallp...@juno.com>
Date: Tue, 24 Feb 2009 09:18:51 -0800 (PST)
Local: Tues, Feb 24 2009 12:18 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl
On Feb 24, 1:36 am, David Filmer <use...@DavidFilmer.com> wrote:

> #!/usr/bin/perl
> use strict;

> print hashpjw( 'bar', 123 );

Call hashpjw with two args, a string 'bar' and a number 123

> sub hashpjw {
>    my( $char, $s ) = @_;

Get two args: $char = 'bar' and $s = 123

>    for my $char( $p = $s, $p, $p++ ) {

Here you define a new $char hiding the one that has 'bar'.

Also this loop is almost certainly not doing what you want, unless
what you want is very strange:

perl -e 'for $c ($p=123, $p, $p++) { print $c,"\n" }'
124
124
123

> Any assistance is greatly appreciated!

Start by treating perl as a different language than C.  First off,
there are no pointers, so trying to do pointer arithmetic is, well,
pointless.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Feb 24 2009, 12:30 pm
Newsgroups: comp.lang.perl.misc
From: Uri Guttman <u...@stemsystems.com>
Date: Tue, 24 Feb 2009 12:30:01 -0500
Local: Tues, Feb 24 2009 12:30 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl

>>>>> "DF" == David Filmer <use...@DavidFilmer.com> writes:

hi dave!

  DF> In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
  DF> (1989, Prentice Hall, page 14), I find a short hashing subroutine
  DF> ("hash" as in cryptographic hash, not an associative array) by the
  DF> famous Peter Weinberger.

well, perl hashes are called that because they use a hash function to
allow O(1) (or close to it) access!

  DF> The subroutine accepts a string, and returns a numerical value. One
  DF> possible application of such a subroutine would be for storing (and
  DF> retrieving) of a vast number of files (many thousands or millions) -
  DF> the files could be stored in an arbitrary number of directories/
  DF> filesystems (ie, "hash buckets") and retrieved based on the hash value
  DF> of their filename.

which is also how perl hashes work!

  DF> int
  DF> hashpjw( char *s ) {
  DF>    const prime = 211;
  DF>    unsigned hash = 0, g;

  DF>    for( char *p = s; *p, p++ ) {
  DF>       hash = ( hash << 4 ) + *p;
  DF>       // assumes 32 bit int size
  DF>       if( g = hash & 0xf0000000 ) {
  DF>          hash ^= g >> 24;
  DF>          hsh ^= g;
  DF>       }
  DF>    }
  DF>    return hash % prime;
  DF> }

  DF> print hashpjw( 'bar', 123 );

you pass a string to $char but a number to $s

  DF> sub hashpjw {
  DF>    my( $char, $s ) = @_;

the c++ code declares only a single char pointer arg which is similar to
a perl string. and as i pointed out above, you pass in two args.

  DF>    my $prime = 211;
  DF>    my ( $hash, $g, $p );
  DF>    for my $char( $p = $s, $p, $p++ ) {

that redeclares $char so the string arg you pass in is never used. also
this loop just scans the string to be hashed. there are many easy idioms
for this. this s about the simplest but may be slow on very long
strings:

        foreach my $char ( split //, $s ) {

this works better on long strings:

        while( my ($char) = $s =~ /(.)/sg ) {

  DF>       $hash = ( $hash << 4 ) + $p;

you never set $p to anything. it needs to be the next char. so change $p
to $char. *p is accessing the next char in the c++ string.

  DF>       if( $g = $hash & 0xf0000000 ) {
  DF>          $hash = $hash ^ ($g >> 24);
  DF>          $hash = $hash ^ $g;
  DF>       }

that looks ok but i can't tell without testing.

  DF>    }
  DF>    return $hash % $prime;

same here.

well, you never scan the input string which is never properly passed
in. fix those bugs and see what happens. also test this out with a
single char string so you can debug all the steps more easily.

uri

--
Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
s...@netherlands.com  
View profile  
 More options Feb 24 2009, 1:24 pm
Newsgroups: comp.lang.perl.misc
From: s...@netherlands.com
Date: Tue, 24 Feb 2009 18:24:46 GMT
Local: Tues, Feb 24 2009 1:24 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl
On 24 Feb 2009 09:11:20 GMT, j...@toerring.de (Jens Thoms Toerring) wrote:

>David Filmer <use...@davidfilmer.com> wrote:
>> In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
>> (1989, Prentice Hall, page 14), I find a short hashing subroutine

[snip]

The oputput for this is the same as the C function, for
'bar' its 168. However the C function as typed in won't compile
without some fixes.

If there were ever a need for speed, the last two $hash asignments
can be combined into a single asignment.

$hash = ($hash ^ $g >> 24) ^ $g;

-sln


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
s...@netherlands.com  
View profile  
 More options Feb 24 2009, 1:32 pm
Newsgroups: comp.lang.perl.misc
From: s...@netherlands.com
Date: Tue, 24 Feb 2009 18:32:30 GMT
Local: Tues, Feb 24 2009 1:32 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl

[snip]

Jen has a nice Perl solution.
But, the C code won't compile under C++, probably some typo's.

C++ does not support default-int
There is no default type for const.
The for loop has a ',' where it should have a ';'.

Fixed up and speedier version:

int
hashpjw( char *s ) {
   const int prime = 211;
   unsigned hash = 0, g;

   for( char *p = s; *p; p++ ) {
      hash = ( hash << 4 ) + *p;
      // assumes 32 bit int size
      if( g = hash & 0xf0000000 )
         hash = (hash ^ g >> 24) ^ g;
   }
   return hash % prime;

}

-sln

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ted Zlatanov  
View profile  
 More options Feb 25 2009, 4:16 pm
Newsgroups: comp.lang.perl.misc
From: Ted Zlatanov <t...@lifelogs.com>
Date: Wed, 25 Feb 2009 15:16:16 -0600
Local: Wed, Feb 25 2009 4:16 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl
On 24 Feb 2009 09:11:20 GMT j...@toerring.de (Jens Thoms Toerring) wrote:

JTT>     for ( split //, $s ) {
JTT>        $hash = ( $hash << 4 ) + ord $_;
...
JTT> The Perl version as it is will happily accept e.g. Unicode
JTT> characters and then the result will differ from the C/C++ version!

An easy way to fix that is to replace the lines with

for ( unpack "C*", $s) {
        $hash = ( $hash << 4 ) + $_;

which will probably also be faster, a nice property for a hashing
function (unpack over split, and no ord() call).

Ted


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Thoms Toerring  
View profile  
 More options Feb 25 2009, 4:52 pm
Newsgroups: comp.lang.perl.misc
From: j...@toerring.de (Jens Thoms Toerring)
Date: 25 Feb 2009 21:52:37 GMT
Local: Wed, Feb 25 2009 4:52 pm
Subject: Re: Trying to convert small C++ subroutine (by Peter Weinberger) to Perl

Ted Zlatanov <t...@lifelogs.com> wrote:
> On 24 Feb 2009 09:11:20 GMT j...@toerring.de (Jens Thoms Toerring) wrote:
> JTT>     for ( split //, $s ) {
> JTT>        $hash = ( $hash << 4 ) + ord $_;
> ...
> JTT> The Perl version as it is will happily accept e.g. Unicode
> JTT> characters and then the result will differ from the C/C++ version!
> An easy way to fix that is to replace the lines with
> for ( unpack "C*", $s) {
>         $hash = ( $hash << 4 ) + $_;
> which will probably also be faster, a nice property for a hashing
> function (unpack over split, and no ord() call).

Is a char guaranteed to be unsigned in C++? I'm only sure it
isn't in C, so the behaviour of the original function (which
could be C or C++) wouldn't be well-defined at least in C for
anything than 7-bit (ASCII) chars as the input.

Using unpack() is definitely more elegant (and more in the
spirit of the original function should there be Unicode chars
in the string). All one would need to know is if "C*" or "c*"
would have to be used with unpack().

                            Regards, Jens
--
  \   Jens Thoms Toerring  ___      j...@toerring.de
   \__________________________      http://toerring.de


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »