I'm working on something where I need to read a (binary) file bit by bit and
do something depending on whether the bit is 0 or 1.
Any help on doing the actual file reading is appreciated.
Thanks in advance
Well, smallest unit you can read is an octet/byte. You then check the
individual digits of the byte using binary masks.
f = open(...)
data = f.read()
for byte in data:
for i in range(8):
bit = 2**i & byte
...
Uli
--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
> I'm working on something where I need to read a (binary) file bit by bit
> and do something depending on whether the bit is 0 or 1.
>
> Any help on doing the actual file reading is appreciated.
The logical unit in which files are written is the byte. You can split the
bytes into 8 bits...
>>> def bits(f):
... while True:
... b = f.read(1)
... if not b: break
... b = ord(b)
... for i in range(8):
... yield b & 1
... b >>= 1
...
>>> with open("tmp.dat", "wb") as f: # create a file with some example data
... f.write(chr(0b11001010)+chr(0b10101111))
>>> with open("tmp.dat", "rb") as f:
... for bit in bits(f):
... print bit
...
0
1
0
1
0
0
1
1
1
1
1
1
0
1
0
1
but that's a very inefficient approach. If you explain what you are planning
to do we can most certainly come up with a better alternative.
Peter
You're reading those bits backwards. You want to read the most
significant bit of each byte first...
Richard.
Correction: Of course you have to use ord() to get from the single-element
string ("byte" above) to its integral value first.
Of course that depends on the need of the OP?
... byte = f.read(1)
... if not byte: break
... byte = ord(byte)
... for i in reversed(range(8)):
... yield byte >> i & 1
...
>>> with open("tmp.dat", "rb") as f:
... for bit in bits(f):
... print bit,
...
1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1
> You're reading those bits backwards. You want to read the most
> significant bit of each byte first...
Says who?
There is no universal standard for bit-order.
Among bitmap image formats, XBM is LSB-first while BMP and PBM are
MSB-first. OpenGL reads or writes bitmap data in either order, controlled
by glPixelStorei().
Most serial communication links (e.g. RS-232, ethernet) transmit the LSB
first, although there are exceptions (e.g. I2C uses MSB-first).
Says Python:
>>> bin(192)
'0x11000000'
That said, I totally agree that there is no inherently right way and I guess
Richard was just a smiley or two short in order to have correct markup in
his not-so-serious posting.
:^)
> Nobody wrote:
>> On Mon, 07 Jun 2010 02:31:08 -0700, Richard Thomas wrote:
>>
>>> You're reading those bits backwards. You want to read the most
>>> significant bit of each byte first...
>>
>> Says who?
>
> Says Python:
>
>>>> bin(192)
> '0x11000000'
Hmm, if that's what /your/ Python says, here's mine to counter:
>>> bin(192)
'0_totally_faked_binary_00000011'
;)
Peter
Argh! Of course one of my Pythons says '0b11000000' and not what I mistyped
above.... =(
Uli
*goes and hides under a stone*
mine goes like this:
>>> bin(192)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'bin' is not defined
Yep, one of mine, too. The "bin" function was new in 2.6, as were binary
number literals ("0b1100").
Uli
> You're reading those bits backwards. You want to read the most
> significant bit of each byte first...
Can you explain the reasoning behind that assertion?
--
Grant Edwards grant.b.edwards Yow! I can't decide which
at WRONG TURN to make first!!
gmail.com I wonder if BOB GUCCIONE
has these problems!
In Py3 (OP did not specify), a binary file is read as bytes, which is a
sequence of ints, and one would have to not use ord() ;=)
tjr
Hi,
Have you looked at the numpy libraries?
It would be very easy to do...
import numpy as np
f = open("something.bin", "rb")
data = np.fromfile(f, np.uint8)
data = np.where(data == 0, data * 5, data)
So in this example I am just saying if data = 0, multiply by 5. This
saves the need for slow loops as well.
Mart.