Reading Binary Files

Mazin07

unread,

Sep 19, 2005, 10:18:38 PM9/19/05

to

I made a little PHP thingy where you can search for any string of
numbers in the first 8 million digits of pi.

However, they're currently stored in ASCII, which means it's 2bytes for
a digit, and it's horribly inefficient. I used perl to pack it as 2
digits a byte, basically storing it as hex without A-F.

My problem is, I need a fast way to search the file by the half byte.
Asides from reading all 8 million digits and converting it to hex, how
else can I do it?

Chung Leong

unread,

Sep 19, 2005, 11:49:01 PM9/19/05

to

Just compress the file with the zlib functions. The compression routine
will squeeze out most of the redundancy.

Steve

unread,

Sep 20, 2005, 9:44:06 AM9/20/05

to

> Just compress the file with the zlib functions. The compression routine
> will squeeze out most of the redundancy.

OT: Chung, I was about to tell you off for suggesting that an
irrational number like PI could be compressed but... luckily I paused
for a moment, and googled this
<http://thestarman.dan123.com/math/pi/picalcs.htm> that shows better
than 50% compression rates. So that told me.

Back to the OP, where you go from here depends entirely on what your
search algorithm is doing at the moment with the string version of PI.
If it is a brute force dumb search, it will not need much modification
but it will be slow.

You need to pack the search substring first in the same way as your PI
digits. You probably have to do a minimum of two searches: one to look
for odd-aligned substrings and one to look for even-aligned substrings.
Doing a search with a substring starting or ending on an odd-aligned
digit also means managing the "don't care" digits. You might be able to
devise a suitable regexp to handle all of this. Best of luck. Can you
tell me if my birthday is in there (19590318)?

---
Steve

Mazin07

unread,

Sep 20, 2005, 2:00:39 PM9/20/05

to

I haven't tried it yet, but I was figuring I could maybe unpack the
whole thing or partrs of it at a time into hex and then search it from
there.

BTW: 19590318 isn't in the first 8 million digits.
I guess you could try (the slow one) for yourself.
http://ericjiang.co.nr/pi.php

Steve

unread,

Sep 21, 2005, 12:27:43 PM9/21/05

to

> I haven't tried it yet, but I was figuring I could maybe unpack the
> whole thing or partrs of it at a time into hex and then search it from
> there.

Maybe you can get away with that but your performance could be
dreadful. You'd have to deal with searching across part boundaries,
sliding windows and all that jazz. But you will have to decide whether
to optimise execution time, or storage cost, or memory use, or
compromise on everything. What is your main constraint? Space on disk,
memory, execution time or complexity of algorithm?

Could be that I'm underestimating the resources you have: on my desktop
even a brute force search (strpos) is reasonably fast against an 8M
string, and preg_match is faster. However I haven't looked at what this
is doing to my resources. Your server might buckle under a heavy load
if searching in pi becomes the Next Big Thing.

---
Steve