Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Reading a binary file

3 views
Skip to first unread message

Sorin Marti

unread,
Jun 26, 2003, 4:01:06 AM6/26/03
to
Hi all,

I am quite new to python and very new to this list.

I've got following problem. I have a binary file which contains
information I should read. I can open the file with

f = open ('cpu1db2.dat', 'rb')

That's no problem. But now I need the hex values of the binary file.

Is there a possibility to show hex-values of the bytes of a file?


Thanks in advance
Sorin


Andrew Bennetts

unread,
Jun 26, 2003, 4:33:05 AM6/26/03
to
On Thu, Jun 26, 2003 at 10:01:06AM +0200, Sorin Marti wrote:
> Hi all,
>
> I am quite new to python and very new to this list.
>
> I've got following problem. I have a binary file which contains
> information I should read. I can open the file with
>
> f = open ('cpu1db2.dat', 'rb')
>
> That's no problem. But now I need the hex values of the binary file.

You can get the hex value of a 1-character string with hex(ord(char)), e.g.:

>>> char = 'a'
>>> hex(ord(char))
'0x61'

But I'm guessing that you might find the 'struct' module even more useful:
http://python.org/doc/current/lib/module-struct.html

-Andrew.


Sorin Marti

unread,
Jun 26, 2003, 5:33:20 AM6/26/03
to
Hi Andrew,

Thanks for your answer!

Andrew Bennetts wrote:
> On Thu, Jun 26, 2003 at 10:01:06AM +0200, Sorin Marti wrote:
>
>>But now I need the hex values of the binary file.
>
> You can get the hex value of a 1-character string with hex(ord(char)), e.g.:
>
> >>> char = 'a'
> >>> hex(ord(char))
> '0x61'
>

That is not exactly what I meant. I've found a solution (a is the binary
data):

b = binascii.hexlify(a)

For example it gives me C8 which is a HEX-Value. How to change this one
into a decimal? (The decimal should be 130, right?)

Thanks in advance

Sorin


Steve Holden

unread,
Jun 26, 2003, 8:37:39 AM6/26/03
to
"Sorin Marti" <m...@semafor.ch> wrote in message
news:mailman.1056620083...@python.org...

If a is the "binary data" (i.e. a one-byte string) then ord(a) is indeed the
answer you want:

>>> import binascii
>>> for c in "ab12+":
... print binascii.hexlify(c), ord(c)
...
61 97
62 98
31 49
32 50
2b 43
>>>

Or is there something else you aren't telling us?

regards
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/

Steven Taschuk

unread,
Jun 26, 2003, 8:50:15 AM6/26/03
to
Quoth Sorin Marti:
[...]

> That is not exactly what I meant. I've found a solution (a is the binary
> data):
>
> b = binascii.hexlify(a)
>
> For example it gives me C8 which is a HEX-Value. How to change this one
> into a decimal? (The decimal should be 130, right?)

200, actually; hex C = 12, so hex C8 = 12*16 + 8 = 200.

Each byte represents a number in the range [0..255]. Those
numbers may be obtained with the ord function, as Andrew noted:

>>> bytes = file('foo.py', 'rb').read(10)
>>> bytes
'def inplac'
>>> map(ord, bytes)
[100, 101, 102, 32, 105, 110, 112, 108, 97, 99]

These are numbers, so you can, e.g., do arithmetic with them.
They are neither hex numbers nor decimal numbers; 'hex' and
'decimal' describe notations by which numbers may be represented,
not numbers themselves. So, when you ask for the hex value, it
seems you want a string containing characters which represent the
number in hexadecimal notation. The hex function does that, again
as Andrew noted:

>>> numbers = map(ord, bytes)
>>> [hex(num)[2:] for num in numbers]
['64', '65', '66', '20', '69', '6e', '70', '6c', '61', '63']

(I've cut out the leading '0x' which this function produces.) As
you've discovered, binascii.hexlify does this all in one step --
turns a string containing bytes into a string containing
hexadecimal digits representing the bytes:

>>> import binascii
>>> binascii.hexlify(bytes)
'64656620696e706c6163'

If, for some reason, you wanted to obtain numbers from such a
string, you could do it this way, for example:

>>> def hex2numbers(s):
... numbers = []
... for i in range(0, len(s), 2):
... numbers.append(int(s[i:i+2], 16))
... return numbers
...
>>> hex2numbers(binascii.hexlify(bytes))
[100, 101, 102, 32, 105, 110, 112, 108, 97, 99]

--
Steven Taschuk stas...@telusplanet.net
"I tried to be pleasant and accommodating, but my head
began to hurt from his banality." -- _Seven_ (1996)

Andrew Bennetts

unread,
Jun 26, 2003, 8:55:01 AM6/26/03
to
On Thu, Jun 26, 2003 at 11:33:20AM +0200, Sorin Marti wrote:
> Andrew Bennetts wrote:
> >On Thu, Jun 26, 2003 at 10:01:06AM +0200, Sorin Marti wrote:
> >
> >>But now I need the hex values of the binary file.
> >
> >You can get the hex value of a 1-character string with hex(ord(char)),
> >e.g.:
> >
> > >>> char = 'a'
> > >>> hex(ord(char))
> > '0x61'
> >
>
> That is not exactly what I meant. I've found a solution (a is the binary
> data):
>
> b = binascii.hexlify(a)
>
> For example it gives me C8 which is a HEX-Value. How to change this one
> into a decimal? (The decimal should be 130, right?)

I think you might be confused about how bytes and numbers are related. Have
a look at this:

>>> char = 'a'
>>> ord(char)
97
>>> type(ord(char))
<type 'int'>
>>> type(hex(ord(char)))
<type 'str'>

So I'm guessing you don't really want the hex representation at all!

A quick way to convert a string of bytes into the corresponding numerical
values is:

>>> s = 'hello'
>>> map(ord, s)
[104, 101, 108, 108, 111]

or:

>>> s = 'hello'
>>> [ord(char) for char in s]
[104, 101, 108, 108, 111]

And again, I *strongly* suggest you look at the struct module -- I'm not
sure what you're trying to do, but if you're trying to interpret binary data
into numbers and things, it's almost certainly helpful:

http://python.org/doc/current/lib/module-struct.html

e.g.:

>>> s = 'hello'
>>> struct.unpack('5B', s)
(104, 101, 108, 108, 111)

-Andrew.


Peter Hansen

unread,
Jun 26, 2003, 9:48:09 AM6/26/03
to
Sorin Marti wrote:
>
> I am quite new to python and very new to this list.
>
> I've got following problem. I have a binary file which contains
> information I should read. I can open the file with
[snip]

It would really be best if you could describe in more detail
what you are trying to do with this data. Bytes are bytes,
and things like hex and binary are just different _representations_
of bytes, so whether you want binary, hex, decimal, or something
else depends entirely on the use to which you will put the info.

-Peter

Sorin Marti

unread,
Jun 26, 2003, 10:14:58 AM6/26/03
to

Hi Peter,

Ok I'll try to give more details. I have a Siemens SPS. With an SPS you
can controll machines such as pumps or motors or anything else. To
controll you have to set Variables. If you want to see which state these
variables have you can get a file via ftp where these values are stored.
This is what I have done. Now I have a file (called cpu1db2.dat) and
this file has a length of 16 bytes.

Byte Number/Length Type Hex-Value
----------------------------------------------------------------
Byte 1: Boolean: 01 (which is true, 00 would be false)
Byte 2: Byte: 11 (This data type is called byte)
Byte 3: Char: 50 (Which should be a "P")
Byte 4,5: Word 00 00
Byte 6,7: Integer 22 04
Byte 8,9,10,11: DoubleWord D2 00 00 BB
Byte 12,13,14,15,16: Real BB 42 C8 00 00


So I have written a python class which makes a connection to the
ftp-server (on the SPS) and gets the file.
Then there is a function where you can call a value with a startbyte and
an endbyte. You also have to specify the type. That means you can call
getValue('REAL',12,16) and you should get back 100 because if you have
the binary value of 'BB 42 C8 00 00' is 01000010110010000000000000000000
, first digit is the Sign (which is + or - ), next 8 digits are the
exponent, in this case 10000101 = 133dec. Now you take away 127 from 133
then you get six, thats the exponent. The rest
(110010000000000000000000) has a hex value of C80000 this is 13107200
decimal. Now you have to multiply 13107200 with 2^6 and 2^-23 and you
get (tataaaaaa!): 100!

The different data types need different calculations, that's why I asked
a few things about changing the representation because I only can do
some things in binary mode or hex mode.

Cheers
Sorin Marti


Peter Hansen

unread,
Jun 26, 2003, 11:15:50 AM6/26/03
to
Sorin Marti wrote:
>
> Ok I'll try to give more details. I have a Siemens SPS. With an SPS you
> can controll machines such as pumps or motors or anything else. To
> controll you have to set Variables. If you want to see which state these
> variables have you can get a file via ftp where these values are stored.
> This is what I have done. Now I have a file (called cpu1db2.dat) and
> this file has a length of 16 bytes.
>
> Byte Number/Length Type Hex-Value
> ----------------------------------------------------------------
> Byte 1: Boolean: 01 (which is true, 00 would be false)
> Byte 2: Byte: 11 (This data type is called byte)
> Byte 3: Char: 50 (Which should be a "P")
> Byte 4,5: Word 00 00
> Byte 6,7: Integer 22 04
> Byte 8,9,10,11: DoubleWord D2 00 00 BB
> Byte 12,13,14,15,16: Real BB 42 C8 00 00

Excellent detail! (It's a pleasure to help someone who actually takes
the time to put together a question with this much care! Thank you. :-)

> Then there is a function where you can call a value with a startbyte and
> an endbyte. You also have to specify the type. That means you can call
> getValue('REAL',12,16) and you should get back 100 because if you have
> the binary value of 'BB 42 C8 00 00' is 01000010110010000000000000000000
> , first digit is the Sign (which is + or - ), next 8 digits are the
> exponent, in this case 10000101 = 133dec. Now you take away 127 from 133
> then you get six, thats the exponent. The rest
> (110010000000000000000000) has a hex value of C80000 this is 13107200
> decimal. Now you have to multiply 13107200 with 2^6 and 2^-23 and you
> get (tataaaaaa!): 100!
>
> The different data types need different calculations, that's why I asked
> a few things about changing the representation because I only can do
> some things in binary mode or hex mode.

Okay, so clearly you understand about bytes and such.... you just need
help with the specific ways of doing such things with Python. (?)

Folks have already shown you how to do hex(abyte) if you have a single
byte out of the above string of 16 bytes... That will return a
representation starting with 0x, however, so maybe ("%02x" % byte)
is more what you would need. You can also extend that to ("%04x" % word)
or %08x for a long if you need.

More likely, the comments about using the struct module are right on
target. You could easily write a string that would convert the entire
16 byte package all at once, except for your proprietary (?) float
format, which you already have under control.

Check out struct, then if you still need help, we'll be down to
specifics.

-Peter

Axel Bock

unread,
Jun 26, 2003, 11:19:43 AM6/26/03
to
Am Thu, 26 Jun 2003 16:14:58 +0200 schrieb Sorin Marti:

> This is what I have done. Now I have a file (called cpu1db2.dat) and
> this file has a length of 16 bytes.
>
> Byte Number/Length Type Hex-Value
> ----------------------------------------------------------------

> [... content description ...]


> So I have written a python class which makes a connection to the

> [... lots of strange calculation ...]


> 10000101 = 133dec. Now you take away 127 from 133 then you get six,
> thats the exponent. The rest (110010000000000000000000) has a hex value
> of C80000 this is 13107200 decimal. Now you have to multiply 13107200
> with 2^6 and 2^-23 and you get (tataaaaaa!): 100!

whew. I don't get it, but anyways I think I can be useful ;-)

look at the struct-module: "This module performs conversions between
Python values and C structs represented as Python strings. It uses format
strings (explained below) as compact descriptions of the lay-out of the C
structs and the intended conversion to/from Python values. This can be
used in handling binary data stored in files or from network connections,
among other sources." (out of the python-doc).

learning by example:

to convert a 4 byte integer you'd write:
"struct.unpack("=I",str_of_len_4)[0]"

= means native endian format - use the machine's endian format
I means unsigned int
str_of_len_4 has to be a binary string of length 4 containing the int
and the [0] at the end is neccessary cause unpack *always* returns a list,
even if only one value is converted (otherwise it'd be [2387], for example).
this is even stackable:
"struct.unpack("=IIH", str_of_len_10)[0]"
converts two unsigned ints, one unsigned short. great, huh? :-)

Hope this is what you need. your explanations seemed rather complicated to
me ;-)


greetings,

axel.

Peter Hansen

unread,
Jun 26, 2003, 11:28:52 AM6/26/03
to
Sorin Marti wrote:
>
> Byte Number/Length Type Hex-Value
> ----------------------------------------------------------------
> Byte 12,13,14,15,16: Real BB 42 C8 00 00
>
> you can call
> getValue('REAL',12,16) and you should get back 100 because if you have
> the binary value of 'BB 42 C8 00 00' is 01000010110010000000000000000000
> , first digit is the Sign (which is + or - ), next 8 digits are the
> exponent, in this case 10000101 = 133dec. Now you take away 127 from 133
> then you get six, thats the exponent. The rest
> (110010000000000000000000) has a hex value of C80000 this is 13107200
> decimal. Now you have to multiply 13107200 with 2^6 and 2^-23 and you
> get (tataaaaaa!): 100!

I think you might be interpreting (or explaining?) the format of that
real incorrectly.

If the first bit is the sign, and the next 8 bits are the
exponent, and the rest is mantissa, then your exponent should
be 01110110 (or h76 or d118) and your mantissa value in hex
would be all of the 42 C8 00 00, or 1120403456 in decimal.

(Basically, your binary value as shown is wrong. BB42C80000 is really
1010 1010 0100 0010 1100 1000 0000 0000 0000 0000 0000 and not your
value of 0100 0010 1100 1000 0000 0000 0000 0000 as shown above.)

-Peter

Anand Pillai

unread,
Jun 26, 2003, 11:48:33 AM6/26/03
to
You need to convert the hex to int with radix 16.

c='a'
h=binascii.hexlify(c)
d=int(h, 16)

Anand Pillai


Sorin Marti <m...@semafor.ch> wrote in message news:<mailman.1056620083...@python.org>...

Peter Abel

unread,
Jun 26, 2003, 5:52:50 PM6/26/03
to
Sorin Marti <m...@semafor.ch> wrote in message news:<mailman.1056637113...@python.org>...

As some others described the struct module should do the right work:
>>> import struct
### This should be your data to read from a file into a string x.
>>> x='\x01\x11P\x00\x00\x22\x04\xd2\x00\x00\xbb\xbb\x42\xc8\x00\x00'
### The format to decode your data except the Real.
### The **>** is necessary because your data come little-endian.
>>> decode_fmt='>BBcHHI'
>>> (Boolean,Byte,Char,Word,Integer,DoubleWord)=struct.unpack(decode_fmt,x[:-5])
>>> format="""
... Boolean : %02X
... Byte : %02X
... Char : %s
... Word : %04X
... Integer : %04X
... DoubleWord: %04X"""
>>> print format%(Boolean,Byte,Char,Word,Integer,DoubleWord)

Boolean : 01
Byte : 11
Char : P
Word : 0000
Integer : 2204
DoubleWord: D20000BB

> So I have written a python class which makes a connection to the
> ftp-server (on the SPS) and gets the file.
> Then there is a function where you can call a value with a startbyte and
> an endbyte. You also have to specify the type. That means you can call
> getValue('REAL',12,16) and you should get back 100 because if you have

I guess your Real is a 4-Byte Realvalue and you meant: '42 C8 00 00'
what is the value of your binary representation.

> the binary value of 'BB 42 C8 00 00' is 01000010110010000000000000000000
> , first digit is the Sign (which is + or - ), next 8 digits are the
> exponent, in this case 10000101 = 133dec. Now you take away 127 from 133
> then you get six, thats the exponent. The rest
> (110010000000000000000000) has a hex value of C80000 this is 13107200
> decimal. Now you have to multiply 13107200 with 2^6 and 2^-23 and you
> get (tataaaaaa!): 100!
>

I'm not quite sure if I understand the format your're describing above.
I dealed some time ago with IEEE and some AMD FPU-Format but you seem
to me to describe a format where the exponent goes over Bytelimits.

A function could be somewhat as the following:
>>> def str2real(s):
... sign = ord(s[0])&0x80 and -1 or 1
... expo = ((ord(s[0])&0x7f)<<1) + (ord(s[1])>>7) - 127
... mantissa = ((ord(s[1])<<16)|0x80) + (ord(s[2])<<8) + ord(s[3])
... print 'sign=%d, expo=%d, mantissa=%06X'%(sign,expo,mantissa)
... return sign*2**(expo-23)*mantissa
...
>>> str2real(x[-4:])
sign=1, expo=6, mantissa=C80080
100.0009765625

Thoug I'm not sure if I hit the goal, cause normally the exponent is
in 6 or 7 or 8 Bits 2's complement and then there would be a - 64 or
- 128 or - 256 instead of - 127 in the algo. Also a 23 Bit mantissa
seems a bit strange to me. Even if so the 24rth Bit is **1**
by default, why I put **|0x80**.
With an excact description of your Real-Format the solution would
be a "Klacks".



> The different data types need different calculations, that's why I asked
> a few things about changing the representation because I only can do
> some things in binary mode or hex mode.
>
> Cheers
> Sorin Marti

Regards Peter

0 new messages