Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Reading file bit by bit

2,594 views
Skip to first unread message

Alfred Bovin

unread,
Jun 7, 2010, 4:57:53 AM6/7/10
to
Hi all.

I'm working on something where I need to read a (binary) file bit by bit and
do something depending on whether the bit is 0 or 1.

Any help on doing the actual file reading is appreciated.

Thanks in advance


Ulrich Eckhardt

unread,
Jun 7, 2010, 5:12:05 AM6/7/10
to
Alfred Bovin wrote:
> I'm working on something where I need to read a (binary) file bit by bit
> and do something depending on whether the bit is 0 or 1.

Well, smallest unit you can read is an octet/byte. You then check the
individual digits of the byte using binary masks.


f = open(...)
data = f.read()
for byte in data:
for i in range(8):
bit = 2**i & byte
...

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

Peter Otten

unread,
Jun 7, 2010, 5:17:27 AM6/7/10
to
Alfred Bovin wrote:

> I'm working on something where I need to read a (binary) file bit by bit
> and do something depending on whether the bit is 0 or 1.
>
> Any help on doing the actual file reading is appreciated.

The logical unit in which files are written is the byte. You can split the
bytes into 8 bits...

>>> def bits(f):
... while True:
... b = f.read(1)
... if not b: break
... b = ord(b)
... for i in range(8):
... yield b & 1
... b >>= 1
...
>>> with open("tmp.dat", "wb") as f: # create a file with some example data
... f.write(chr(0b11001010)+chr(0b10101111))
>>> with open("tmp.dat", "rb") as f:
... for bit in bits(f):
... print bit
...
0
1
0
1
0
0
1
1
1
1
1
1
0
1
0
1

but that's a very inefficient approach. If you explain what you are planning
to do we can most certainly come up with a better alternative.

Peter

Richard Thomas

unread,
Jun 7, 2010, 5:31:08 AM6/7/10
to

You're reading those bits backwards. You want to read the most
significant bit of each byte first...

Richard.

Ulrich Eckhardt

unread,
Jun 7, 2010, 6:20:11 AM6/7/10
to
Ulrich Eckhardt wrote:
> data = f.read()
> for byte in data:
> for i in range(8):
> bit = 2**i & byte
> ...

Correction: Of course you have to use ord() to get from the single-element
string ("byte" above) to its integral value first.

Lie Ryan

unread,
Jun 7, 2010, 6:20:29 AM6/7/10
to

Of course that depends on the need of the OP?

Peter Otten

unread,
Jun 7, 2010, 6:28:36 AM6/7/10
to
Richard Thomas wrote:

... byte = f.read(1)
... if not byte: break
... byte = ord(byte)
... for i in reversed(range(8)):
... yield byte >> i & 1
...


>>> with open("tmp.dat", "rb") as f:
... for bit in bits(f):

... print bit,
...
1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1

Nobody

unread,
Jun 7, 2010, 6:41:45 AM6/7/10
to
On Mon, 07 Jun 2010 02:31:08 -0700, Richard Thomas wrote:

> You're reading those bits backwards. You want to read the most
> significant bit of each byte first...

Says who?

There is no universal standard for bit-order.

Among bitmap image formats, XBM is LSB-first while BMP and PBM are
MSB-first. OpenGL reads or writes bitmap data in either order, controlled
by glPixelStorei().

Most serial communication links (e.g. RS-232, ethernet) transmit the LSB
first, although there are exceptions (e.g. I2C uses MSB-first).

Ulrich Eckhardt

unread,
Jun 7, 2010, 7:20:06 AM6/7/10
to
Nobody wrote:
> On Mon, 07 Jun 2010 02:31:08 -0700, Richard Thomas wrote:
>
>> You're reading those bits backwards. You want to read the most
>> significant bit of each byte first...
>
> Says who?

Says Python:

>>> bin(192)
'0x11000000'

That said, I totally agree that there is no inherently right way and I guess
Richard was just a smiley or two short in order to have correct markup in
his not-so-serious posting.

:^)

Peter Otten

unread,
Jun 7, 2010, 7:43:58 AM6/7/10
to
Ulrich Eckhardt wrote:

> Nobody wrote:
>> On Mon, 07 Jun 2010 02:31:08 -0700, Richard Thomas wrote:
>>
>>> You're reading those bits backwards. You want to read the most
>>> significant bit of each byte first...
>>
>> Says who?
>
> Says Python:
>
>>>> bin(192)
> '0x11000000'

Hmm, if that's what /your/ Python says, here's mine to counter:

>>> bin(192)
'0_totally_faked_binary_00000011'

;)

Peter

Ulrich Eckhardt

unread,
Jun 7, 2010, 7:57:42 AM6/7/10
to
Peter Otten wrote:

> Ulrich Eckhardt wrote:
>> Says Python:
>>
>>>>> bin(192)
>> '0x11000000'
>
> Hmm, if that's what /your/ Python says, here's mine to counter:
>
>>>> bin(192)
> '0_totally_faked_binary_00000011'

Argh! Of course one of my Pythons says '0b11000000' and not what I mistyped
above.... =(

Uli
*goes and hides under a stone*

superpollo

unread,
Jun 7, 2010, 8:07:05 AM6/7/10
to
Ulrich Eckhardt ha scritto:

> Peter Otten wrote:
>> Ulrich Eckhardt wrote:
>>> Says Python:
>>>
>>>>>> bin(192)
>>> '0x11000000'
>> Hmm, if that's what /your/ Python says, here's mine to counter:
>>
>>>>> bin(192)
>> '0_totally_faked_binary_00000011'
>
> Argh! Of course one of my Pythons says '0b11000000' and not what I mistyped
> above.... =(

mine goes like this:

>>> bin(192)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'bin' is not defined

Ulrich Eckhardt

unread,
Jun 7, 2010, 8:54:04 AM6/7/10
to
superpollo wrote:
> mine goes like this:
>
> >>> bin(192)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'bin' is not defined

Yep, one of mine, too. The "bin" function was new in 2.6, as were binary
number literals ("0b1100").

Uli

Grant Edwards

unread,
Jun 7, 2010, 10:16:30 AM6/7/10
to
On 2010-06-07, Richard Thomas <char...@gmail.com> wrote:

> You're reading those bits backwards. You want to read the most
> significant bit of each byte first...

Can you explain the reasoning behind that assertion?

--
Grant Edwards grant.b.edwards Yow! I can't decide which
at WRONG TURN to make first!!
gmail.com I wonder if BOB GUCCIONE
has these problems!

Terry Reedy

unread,
Jun 7, 2010, 2:58:00 PM6/7/10
to pytho...@python.org
On 6/7/2010 6:20 AM, Ulrich Eckhardt wrote:
> Ulrich Eckhardt wrote:
>> data = f.read()
>> for byte in data:
>> for i in range(8):
>> bit = 2**i& byte

>> ...
>
> Correction: Of course you have to use ord() to get from the single-element
> string ("byte" above) to its integral value first.

In Py3 (OP did not specify), a binary file is read as bytes, which is a
sequence of ints, and one would have to not use ord() ;=)

tjr

Martin

unread,
Jun 8, 2010, 5:47:19 AM6/8/10
to

Hi,

Have you looked at the numpy libraries?

It would be very easy to do...

import numpy as np

f = open("something.bin", "rb")
data = np.fromfile(f, np.uint8)
data = np.where(data == 0, data * 5, data)

So in this example I am just saying if data = 0, multiply by 5. This
saves the need for slow loops as well.

Mart.

0 new messages