Fast Linear Hex Search
flag
Messages 11 - 20 of 37 - Collapse all
/groups/adfetch?hl=en&adid=dfD_jQ4AAACmVauwJD5Gw0NG-xTN_DAk
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
1.  Mazin07  
View profile  
 More options Sep 23 2005, 9:06 pm
Newsgroups: comp.lang.perl.moderated
From: Mazin07 <mazi...@gmail.com>
Date: Sat, 24 Sep 2005 01:06:26 GMT
Local: Fri, Sep 23 2005 9:06 pm
Subject: Fast Linear Hex Search
I need a Perl script to search through a file to find specific hex
strings, NOT ASCII.  Ideally,  it should not read more of the file than
it needs, and to quit when it does find it. The file is about 4 mb, with
8 million hex digits.

How should I go about doing this in perl?  Obviously I don't want to
read the whole thing into memory, and I need to stay away from dodgy
functions that turn hex into ascii.

If anybody's interested, it is 8 million digits of pi, in hex form (2
digits per byte).  I already converted from ASCII into that packed
format, but now I need to search it.

There's a slow PHP version if you care.  http://ericjiang.co.nr/pi.php


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
2.  Jeff Schwab  
View profile  
 More options Sep 24 2005, 9:45 am
Newsgroups: comp.lang.perl.moderated
From: Jeff Schwab <jeffrey.sch...@rcn.com>
Date: Sat, 24 Sep 2005 09:45:24 -0400
Local: Sat, Sep 24 2005 9:45 am
Subject: Re: Fast Linear Hex Search

Mazin07 wrote:
> I need a Perl script to search through a file to find specific hex
> strings, NOT ASCII.  Ideally,  it should not read more of the file than
> it needs, and to quit when it does find it. The file is about 4 mb, with
> 8 million hex digits.

> How should I go about doing this in perl?  Obviously I don't want to
> read the whole thing into memory, and I need to stay away from dodgy
> functions that turn hex into ascii.

> If anybody's interested, it is 8 million digits of pi, in hex form (2
> digits per byte).  I already converted from ASCII into that packed
> format, but now I need to search it.

> There's a slow PHP version if you care.  http://ericjiang.co.nr/pi.php

What do you mean "in hex form?"  Do you just mean that each 4-bit nybble
represents an unsigned integer from 0 to 15?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
3.  jerry davis  
View profile  
 More options Sep 24 2005, 10:41 am
Newsgroups: comp.lang.perl.moderated
From: jerry davis <jfd...@wi.rr.com>
Date: Sat, 24 Sep 2005 09:41:16 -0500
Local: Sat, Sep 24 2005 10:41 am
Subject: Re: Fast Linear Hex Search
On Friday 23 September 2005 08:06 pm, Mazin07 wrote:

> I need a Perl script to search through a file to find specific hex
> strings, NOT ASCII.  Ideally,  it should not read more of the file than
> it needs, and to quit when it does find it. The file is about 4 mb, with
> 8 million hex digits.

> How should I go about doing this in perl?  Obviously I don't want to
> read the whole thing into memory, and I need to stay away from dodgy
> functions that turn hex into ascii.

look at the functions sysopen and sysread
this will do the trick. on sysread you can specify how many bytes to read.

> If anybody's interested, it is 8 million digits of pi, in hex form (2
> digits per byte).  I already converted from ASCII into that packed
> format, but now I need to search it.

this sounds cool, I bought a book on pi, from B&N, and it had I think 10000
digits of pi.

> There's a slow PHP version if you care.  http://ericjiang.co.nr/pi.php

--
Registered Linux User: 275424
Today's Fortune: The only constant is change.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
4.  Ilya Zakharevich  
View profile  
 More options Sep 24 2005, 5:43 pm
Newsgroups: comp.lang.perl.moderated
From: Ilya Zakharevich <nospam-ab...@ilyaz.org>
Date: Sat, 24 Sep 2005 21:43:33 +0000 (UTC)
Local: Sat, Sep 24 2005 5:43 pm
Subject: Re: Fast Linear Hex Search
[A complimentary Cc of this posting was sent to
jerry davis
<jfd...@wi.rr.com>], who wrote in article <200509240941.16311.jfd...@wi.rr.com>:

> > If anybody's interested, it is 8 million digits of pi, in hex form (2
> > digits per byte).  I already converted from ASCII into that packed
> > format, but now I need to search it.

> this sounds cool, I bought a book on pi, from B&N, and it had I think 10000
> digits of pi.

 perl -MMath::Pari=:prec=10000,Pi -wle "print Pi"

AFAIK, the algorithm takes O(n^3/2).  To get a million of digits may
take several minutes on contemporary machines.  8e6 should be feasible
too; make take about hour (but if the cached size of memory is low,
can be much slower).  In both cases you may need to increase the size
of the PARI stack, see the docs.

Hope this helps,
Ilya


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
5.  Mazin07  
View profile  
 More options Sep 24 2005, 3:25 pm
Newsgroups: comp.lang.perl.moderated
From: Mazin07 <mazi...@gmail.com>
Date: Sat, 24 Sep 2005 19:25:28 GMT
Local: Sat, Sep 24 2005 3:25 pm
Subject: Re: Fast Linear Hex Search

jerry davis wrote:
> On Friday 23 September 2005 08:06 pm, Mazin07 wrote:

>>I need a Perl script to search through a file to find specific hex
>>strings, NOT ASCII.  Ideally,  it should not read more of the file than
>>it needs, and to quit when it does find it. The file is about 4 mb, with
>>8 million hex digits.

>>How should I go about doing this in perl?  Obviously I don't want to
>>read the whole thing into memory, and I need to stay away from dodgy
>>functions that turn hex into ascii.

> look at the functions sysopen and sysread
> this will do the trick. on sysread you can specify how many bytes to read.

I'm not too sure on the best way to go about searching it.  With
sysopen, would I have to incrementally read more bytes?  That would mean
looping over a lot - how would it affect the run speed?
How do I get it to give me hex instead of ascii?  I don't want to
continually convert ascii to hex.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
6.  Keith Thompson  
View profile  
 More options Sep 25 2005, 8:31 pm
Newsgroups: comp.lang.perl.moderated
From: Keith Thompson <ks...@mib.org>
Date: Mon, 26 Sep 2005 00:31:36 GMT
Local: Sun, Sep 25 2005 8:31 pm
Subject: Re: Fast Linear Hex Search

Mazin07 <mazi...@gmail.com> writes:
> I need a Perl script to search through a file to find specific hex
> strings, NOT ASCII.  Ideally,  it should not read more of the file
> than it needs, and to quit when it does find it. The file is about 4
> mb, with 8 million hex digits.

> How should I go about doing this in perl?  Obviously I don't want to
> read the whole thing into memory, and I need to stay away from dodgy
> functions that turn hex into ascii.

> If anybody's interested, it is 8 million digits of pi, in hex form (2
> digits per byte).  I already converted from ASCII into that packed
> format, but now I need to search it.

If it's 2 digits per byte, it's not in hex form.  Hexadecimal is a
(typically ASCII) text representation of base-16 numbers, using the
characters '0'..'9', 'a'..'f'.  It typically takes 8 bits per
hexadecimal digit.

The format you're describing sounds like plain binary.  Or is it
binary-coded decimal?

If it's binary-coded decimal, it's going to look something like this:

    0011 0001 0100 0001 0101 1001 ...
    (3)  (1)  (4)  (1)  (5)  (9)  ...

--
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
7.  Mazin07  
View profile  
 More options Sep 26 2005, 7:01 pm
Newsgroups: comp.lang.perl.moderated
From: Mazin07 <mazi...@gmail.com>
Date: Mon, 26 Sep 2005 23:01:41 GMT
Local: Mon, Sep 26 2005 7:01 pm
Subject: Re: Fast Linear Hex Search

Binary-coded decimal actually. I was referring to it like that because
the digits 1 and 2 would show as 12 in hex.

so.. (3.) 14159265 would be

14 15 92 65
when viewed in a hex editor.

So what's the best function to search for strings of numbers in that,
asides from turning it into an ascii string in memory?  What kind of
functions does Perl have that would be suited to that?  I searched
perl's manual, but didn't find anything.

Or would it be better to just expand the whole thing into human-readable
ASCII string in memory?  It seems like that would take all the advantage
out of it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
8.  Yitzchak Scott-Thoennes  
View profile  
 More options Sep 27 2005, 6:43 am
Newsgroups: comp.lang.perl.moderated
From: Yitzchak Scott-Thoennes <sthoe...@efn.org>
Date: Tue, 27 Sep 2005 03:43:56 -0700
Local: Tues, Sep 27 2005 6:43 am
Subject: Re: Fast Linear Hex Search

If you mean you want to search for a string of digits like 14159265
and there may be an odd number of digits, or you want to find them
whether they start at an odd or even position, you have enormously
complicated things by packing your data that way.  You could come
up with a regex that would work, but it's much less trouble just
to have your file not packed.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
9.  Ilya Zakharevich  
View profile  
 More options Sep 27 2005, 5:19 pm
Newsgroups: comp.lang.perl.moderated
From: Ilya Zakharevich <nospam-ab...@ilyaz.org>
Date: Tue, 27 Sep 2005 21:19:44 +0000 (UTC)
Local: Tues, Sep 27 2005 5:19 pm
Subject: Re: Fast Linear Hex Search
[A complimentary Cc of this posting was sent to
Yitzchak Scott-Thoennes
<sthoe...@efn.org>], who wrote in article <20050927104356.GA3...@efn.org>:

> If you mean you want to search for a string of digits like 14159265
> and there may be an odd number of digits, or you want to find them
> whether they start at an odd or even position, you have enormously
> complicated things by packing your data that way.

"Enormously"???  I doubt it.  Fixed strings are easy:

  /\x14\x15\x92\x65/ || /[\x01...\xF1]\x41\x59\x26[\x50-\x5F]/

(of course, this leading [\x01...\xF1] should be completely spelled out).

What *is* enormously complicated is looking for regular expressions in
the expanded string - without expansion...

Hope this helps,
Ilya


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
10.  Yitzchak Scott-Thoennes  
View profile  
 More options Sep 27 2005, 7:50 pm
Newsgroups: comp.lang.perl.moderated
From: Yitzchak Scott-Thoennes <sthoe...@efn.org>
Date: Tue, 27 Sep 2005 16:50:51 -0700
Local: Tues, Sep 27 2005 7:50 pm
Subject: Re: Fast Linear Hex Search
mailed and posted

I didn't mean to imply that it was objectively difficult, just that it
increases the complexity of a solution by a factor of 3 or more.

Just a routine to compute the regex to use for a given series of digits
is going to be as complicated as the entire code needed to search the file
for the digits in ascii form.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2013 Google