Extract double in binary file

Pascal

unread,

Nov 26, 2003, 9:16:30 AM11/26/03

to

Hello,

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?

Thanks.

Gandalf

unread,

Nov 26, 2003, 10:19:29 AM11/26/03

to comp.lang.python

Most likely it is a BCD field. Please watch for number in the file with
a simple text
viewer. In that case, you can read the number as a string (length is
len('999999999.99')
or similar) and convert it with int() or long().

Cheers,

L

Terry Reedy

unread,

Nov 26, 2003, 11:55:43 AM11/26/03

to

"Gandalf" <gan...@geochemsource.com> wrote in message
news:mailman.1105.106985...@python.org...

> >
> >
> >I've a binary file with data in it.
> >This file come from an old ms dos application (multilog ~ 1980).
> >In this application, a field is declared as a 'decimal' (999 999
> >999.99).
> >I put 0.00 in the field and save the record to the file.
> >When I look in the binary file (with python or an hex editor), the
> >field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
> >I try unpack from struct module but the result isn't good.
> >
> >Can someone help me?
> >
> >
> Most likely it is a BCD field.

I believe hex OP gave is for double 0.0e0 (exponents are biased).
OTOH, BCD for 11 digits would be 6 bytes, all 0, which also has.
Putting -1.0 and 1.0 in field and storing would clarify storage format
if really not known.

tjr

Christopher A. Craig

unread,

Nov 26, 2003, 12:20:01 PM11/26/03

to pytho...@python.org

pascal...@free.fr (Pascal) writes:

You're sort of vague here, but I don't think struct is going to help
you regardless. "decimal" in this case is almost certainly some sort
of BCD, which isn't a standard C struct (and therefore unknown to the
struct module).

You really need to figure out how the data is stored. Based on your
one example it looks like it's stored as a series of 7 bit values
representing the decimal digits with 0x7f indicating the decimal
point. If this is correct you could use something like

tstr=''
for c in instr:
if c == chr(0x7f):
tstr+='.'
else:
tstr += str(ord(c))
fl = float(tstr)

With two major caveats:
1) that this is going to return a float, not a decimal
2) There's no way for me to even guess how negative numbers are
represented

--
Christopher A. Craig <list-...@ccraig.org>
"By rights we shouldn't be here." -- Sam in Peter Jackson's
"The Two Towers" while standing in Osgiliath, where he shouldn't be.

Colin Brown

unread,

Nov 26, 2003, 2:38:24 PM11/26/03

to

"Pascal" <pascal...@free.fr> wrote in message
news:e567c03a.03112...@posting.google.com...

If the number is saved in a floating point representation (IEEE?),
typically [sign][exponent][fraction] then you really need to know
what the type is. For example, I had to make cross-platform real
numbers at one stage and fabricated them as below.

Colin Brown
PyNZ

import math

def vmsR4(real):
'''vmsR4(real): returns an integer that is equivalent to a VMS real*4 '''
(m, e) = math.frexp(real)
if m == 0.0:
return 0
else:
sign = m < 0
exp = e + 128
mant = int((16777216L * abs(m)) + 0.5) - 8388608
return (sign << 15) + (exp << 7) + (mant >> 16) + (mant << 16)

Pascal

unread,

Nov 27, 2003, 4:07:58 AM11/27/03

to

pascal...@free.fr (Pascal) wrote in message news:<e567c03a.03112...@posting.google.com>...

First, thanks for answers.

Some precisions:
0.00 > 00-00-00-00-00-00-7F-00
1.00 > 00-00-00-00-00-00-00-81
2.00 > 00-00-00-00-00-00-00-82
3.00 > 00-00-00-00-00-00-40-82
4.00 > 00-00-00-00-00-00-00-83

10.00 > 00-00-00-00-00-00-20-84
1000.00 > 00-00-00-00-00-00-7A-8A

1.11 > 14-AE-47-E1-7A-14-0E-81

Terry Reedy

unread,

Nov 27, 2003, 1:00:03 PM11/27/03

to

"Pascal" <pascal...@free.fr> wrote in message

news:e567c03a.0311...@posting.google.com...
> Some precisions:

('examples': 'precisions' does not work here in English)

> 0.00 > 00-00-00-00-00-00-7F-00
> 1.00 > 00-00-00-00-00-00-00-81
> 2.00 > 00-00-00-00-00-00-00-82
> 3.00 > 00-00-00-00-00-00-40-82
> 4.00 > 00-00-00-00-00-00-00-83
>
> 10.00 > 00-00-00-00-00-00-20-84
> 1000.00 > 00-00-00-00-00-00-7A-8A
>
> 1.11 > 14-AE-47-E1-7A-14-0E-81

The only obvious pattern I see is that 2**0 -> 81, 2**1->82, ...
2**9->8A (where A==10) ie, for non-zero, last byte is 81 + exponent
of largest power of two, which seems like type of float, and first 6
are 0 if integral. May be proprietary format.

TJR

Francis Avila

unread,

Nov 28, 2003, 4:45:31 AM11/28/03

to

Dennis Lee Bieber wrote in message ...
>Pascal fed this fish to the penguins on Thursday 27 November 2003 01:07
>am:

>>
>> Some precisions:
>> 0.00 > 00-00-00-00-00-00-7F-00
>> 1.00 > 00-00-00-00-00-00-00-81
>> 2.00 > 00-00-00-00-00-00-00-82
>> 3.00 > 00-00-00-00-00-00-40-82
>> 4.00 > 00-00-00-00-00-00-00-83
>>
>> 10.00 > 00-00-00-00-00-00-20-84
>> 1000.00 > 00-00-00-00-00-00-7A-8A
>>
>> 1.11 > 14-AE-47-E1-7A-14-0E-81

>converting exponents excess 81...
>

> 0 1 00 00 00 00 00 00 00
> 1 1 00 00 00 00 00 00 00
> 1 1 40 00 00 00 00 00 00
> 2 1 00 00 00 00 00 00 00
> 3 1 20 00 00 00 00 00 00

That is a bizarre format, and of course I had to implement it. (Even C is
more pleasant in Python!).

It works for the cases given, but do find out where the sign bit is for the
mantissa. (This code assumes it's the MSB of the mantissa.)

Also tease out the NaN and +-Infinity cases.

--- Code ---
#! /usr/bin/env python
# by Francis Avila
#
# Decode a peculiar binary floating point encoding
# used by 'multilog', an old dos spreadsheet.

import struct

_known = (('0.00', '\x00\x00\x00\x00\x00\x00\x7F\x00'),
('1.00', '\x00\x00\x00\x00\x00\x00\x00\x81'),
('2.00', '\x00\x00\x00\x00\x00\x00\x00\x82'),
('3.00', '\x00\x00\x00\x00\x00\x00@\x82'),
('4.00', '\x00\x00\x00\x00\x00\x00\x00\x83'),
('10.00','\x00\x00\x00\x00\x00\x00 \x84'),
('1000.00','\x00\x00\x00\x00\x00\x00z\x8a'),
('1.11', '\x14\xaeG\xe1z\x14\x0e\x81'))

def _test():
tests = [(str(float(i)),str(dectofloat(j))) for i,j in _known]
results = [expect==got for expect,got in tests]

failed = [tests[i] for i, passed in enumerate(results) if not passed]

if failed: return failed
else: return 'Passed'

def bin(I):
"""Return list of bits of int I in little endian."""
if I < 0:
raise ValueError, "I must be >= 0"
bits = []
if I == 0:
bits = [0]
while I>0:
r = (I & 0x1)
if r: r = 1
bits.append(r)
I >>= 1
bits.reverse()
return bits

def binaryE(n, exp):
"""Return result of a binary n*10**exp.

As n*10**exp is to decimal, so binaryE(n, exp) is to binary.
"""
return sum([2**(exp-i) for i,bit in enumerate(bin(n)) if bit])

#Add special cases here:
SPECIAL = {'\x00\x00\x00\x00\x00\x00\x7f\x00':0.0}

def dectofloat(S):
"""Return float value of 8-byte 'decimal' string."""
if S in SPECIAL:
return SPECIAL[S]

# Convert to byteswapped long.
N, = struct.unpack('<Q', S)

# Grab exponent and mantissa parts using bitmasks.
# The eight MSBs are exponent; rest mantissa.
exp, mant = (N&(0xffL<<56))>>56, N&~(0xffL<<56)

exp -= 0x81 # Exponential part is excess 0x81 (e.g., 0x82 is 1).

msign = mant & (0x80L<<48) # MSB of mantissa is sign bit. 0==positive.
if not msign:
msign = 1
else:
msign = -1

mant |= (0x80L<<48) # Add implied 1 to the MSB of mantissa.

# Now, binary scientific notation:

return float(msign * binaryE(mant, exp))

Bengt Richter

unread,

Nov 28, 2003, 8:10:09 PM11/28/03

to

On Sat, 29 Nov 2003 00:30:42 GMT, Dennis Lee Bieber <wlf...@ix.netcom.com> wrote:

>Francis Avila fed this fish to the penguins on Friday 28 November 2003
>01:45 am:

>
>>
>>
>> That is a bizarre format, and of course I had to implement it. (Even C
>> is more pleasant in Python!).
>>

> And I thought /I/ was the masochist...
>
> I do have to confess that short tests with struct did reveal that, on
>my system, regular doubles do have the same byte order as the original
>data. I'm just more comfortable with seeing hex representations of
>numbers with the MSB on the left.

>
>> It works for the cases given, but do find out where the sign bit is
>> for the mantissa. (This code assumes it's the MSB of the mantissa.)
>>

> I suspect most would consider the format my college computer used to
>be weird... Xerox Sigma 6... Excess 64 (decimal) (as I recall) exponent
>powers of sixteen! A "normalized" mantissa could have up to three
>leading 0 bits, and there were no "hidden" bit.
>
> S eeeeeee mmmmmmmmm mmmmmmmm mmmmmmmm ...
>
Does the OP have the ability ot generate example values at will,
or is it a matter of scrounging through some old recorded data with
no way of making more?

If he can make more, I'd suggest a number with most of the nybbles
of data individually numbered, e.g.,
0xfedcba987654321
E.g., if it's a 64-bit format, a 64-bit integer converted would probably
tell the a lot about where the bits go (how many get shifted out, hidden,
how they're ordered). And then the same number negative. E.g.,

>>> 0xfedcba987654321
1147797409030816545L
>>> -0xfedcba987654321
-1147797409030816545L
>>> hex(0xfedcba987654321)
'0xFEDCBA987654321L'
>>> hex(-0xfedcba987654321)
'-0xFEDCBA987654321L'
<grr>useless hex representation for looking at bits ...</grr>

>>> hex(-0xfedcba987654321 +2**80)
'0xFFFFF0123456789ABCDFL'

Regards,
Bengt Richter

Pascal

unread,

Dec 1, 2003, 11:45:04 AM12/1/03

to

First thanks for trying!

May be these values will tell you somethings:
1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x81
-1 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x81
2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x82
-2 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x82
3 0x0 0x0 0x0 0x0 0x0 0x0 0x40 0x82
-3 0x0 0x0 0x0 0x0 0x0 0x0 0xc0 0x82
1.1 0xcd 0xcc 0xcc 0xcc 0xcc 0xcc 0xc 0x81
1.2 0x9a 0x99 0x99 0x99 0x99 0x99 0x19 0x81
1.3 0x66 0x66 0x66 0x66 0x66 0x66 0x26 0x81
1.4 0x33 0x33 0x33 0x33 0x33 0x33 0x33 0x81
1.01 0x48 0xe1 0x7a 0x14 0xae 0x47 0x1 0x81
0.01 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7a
1.02 0x8f 0xc2 0xf5 0x28 0x5c 0x8f 0x2 0x81
0.02 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7b

Bengt Richter

unread,

Dec 1, 2003, 1:31:00 PM12/1/03

to

This needs some cleanup and optimization, but for the above it seems to work:

====< PascalParent.py >=================================
#First thanks for trying!

#May be these values will tell you somethings:
data = """\

1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x81
-1 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x81
2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x82
-2 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x82
3 0x0 0x0 0x0 0x0 0x0 0x0 0x40 0x82
-3 0x0 0x0 0x0 0x0 0x0 0x0 0xc0 0x82
1.1 0xcd 0xcc 0xcc 0xcc 0xcc 0xcc 0xc 0x81
1.2 0x9a 0x99 0x99 0x99 0x99 0x99 0x19 0x81
1.3 0x66 0x66 0x66 0x66 0x66 0x66 0x26 0x81
1.4 0x33 0x33 0x33 0x33 0x33 0x33 0x33 0x81
1.01 0x48 0xe1 0x7a 0x14 0xae 0x47 0x1 0x81
0.01 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7a

0.0 0x0 0x0 0x0 0x0 0x0 0x0 0x7F 0x0
"""
def bytes2float(bytes):
if bytes == [0,0,0,0,0,0,0x7f,0]: return 0.0
b = bytes[:]
sign = bytes[-2]&0x80
b[-2] |= 0x80 # hidden most significant bit in place of sign
exp = bytes[-1] - 0x80 -56 # exponent offset
acc = 0L
for i,byte in enumerate(b[:-1]):
acc |= (long(byte)<<(i*8))
return (float(acc)*2.0**exp)*((1.,-1.)[sign!=0])

for line in data.splitlines():
nlist = line.split()
fnum = float(nlist[0])
le_bytes = map(lambda x:int(x,16) ,nlist[1:])
test = bytes2float(le_bytes)
print ' in: %r\nout: %r\n'%(fnum,test)
========================================================
Result:

[10:44] C:\pywk\clp>PascalParent.py
in: 1.0
out: 1.0

in: -1.0
out: -1.0

in: 2.0
out: 2.0

in: -2.0
out: -2.0

in: 3.0
out: 3.0

in: -3.0
out: -3.0

in: 1.1000000000000001
out: 1.1000000000000001

in: 1.2
out: 1.2

in: 1.3
out: 1.3

in: 1.3999999999999999
out: 1.3999999999999999

in: 1.01
out: 1.01

in: 0.01
out: 0.01

in: 0.0
out: 0.0

HTH

Regards,
Bengt Richter

Pascal

unread,

Dec 2, 2003, 12:40:50 PM12/2/03

to

A very big thanks to you.
The function run perfectly (after python 2.3! installed for enumerate function)
If you can, give me more details on the methode or the number's representation.

Thanks a lot!

Bengt Richter

unread,

Dec 2, 2003, 5:17:42 PM12/2/03

to

According to an old MASM 5.0 programmer's guide, there was a Microsoft Binary format
for encoding real numbers, both short (32 bits) and long (64 bits).

There were 3 parts:

1. Biased 8-bit exponent in the highest byte (last in the little-endian view we've been using)
It says the bias is 0x81 for short numbers and 0x401 for long, but I'm not sure where that lines up.
I just got there by experimentation.

2. Sign bit (0 for +, 1 for -) in upper bit of second highest byte.

3. All except the first set bit of the mantissa in the remaining 7 bits of the second highest byte,
and the rest of the bytes. And since the most signficant bit for non-zero numbers is 1, it
is not represented. But if if were, it would share the same bit position where the sign is
(that's why I or-ed it in there to complete the actual mantissa).

MASM also supported a 10-byte format similar to IEEE. I didn't see anything in that section
on NaNs and INFs.

HTH

Regards,
Bengt Richter

Pascal

unread,

Dec 3, 2003, 10:05:38 AM12/3/03

to

Thanks a lot!

Pascal

unread,

Dec 3, 2003, 10:06:10 AM12/3/03

to

Thanks a lot!