float precision with xlrd 0.6.1

273 views
Skip to first unread message

Vaidotas Zemlys

unread,
Jan 31, 2008, 4:27:52 AM1/31/08
to python-excel
Hi,

I am trying to migrate my perl scripts for reading Excel files to
python and I stumbled upon one feature.
Mainly what I want to do is convert Excel file to csv, with my
additional checks. I am using xlrd version 0.6.1 on Ubuntu 7.10 with
python 2.5. For test case I am reading simple table from Excel. The
numbers in this table have a lot of digits after decimal point.
318.939953634425 for example. Now when xlrd reads such number it
becomes 318.93995363442485. (I am doing the reading according to
simple example in readme). Instead of ending 4425, I get 442485. Of
course it is 13 digit after decimal point, and it is on border of
double precision of 16 digits, and probably it is the closest real
number in binary representation, but still I wonder whether it should
be this way. When I enter such number in statistical package R (http://
www.r-project.org) which uses double precision for all real numbers,
such conversion does not happen. That is precisely the reason for my
letter. So my question then would be what is the reason for such
conversion, xlrd, python, or something else? My main gripe is that if
I used xlrd to read Excel file and then imported it again to excel, I
would not get the same result. That would not be the case if I saved
file in csv format read it R then wrote it back and then imported to
Excel. Of course I can write it off to computer treatment of real
numbers, but still I would like to be sure.

Thanks in advance for any insights,

Vaidotas Zemlys

John Machin

unread,
Jan 31, 2008, 6:40:33 AM1/31/08
to python-excel


On Jan 31, 8:27 pm, Vaidotas Zemlys <zem...@gmail.com> wrote:
> Hi,
>
> I am trying to migrate my perl scripts for reading Excel files to
> python and I stumbled upon one feature.
> Mainly what I want to do is convert Excel file to csv, with my
> additional checks. I am using xlrd version 0.6.1 on Ubuntu 7.10 with
> python 2.5. For test case I am reading simple table from Excel. The
> numbers in this table have a lot of digits after decimal point.
> 318.939953634425 for example.

Actually, the numbers have a lot of *bits* after the *binary* point.
What are you using to display them as decimal numbers? AFAIK Excel
will not display more than 15 decimal digits of precision.

Example:
Type 318.939953634425 into cell A1
Type =A1+0.0000000000001 into cell A2
Copy the formula into cells A3:A20
Format cells A1:A20 to show 14 decimal places

You will see
5 x 318.93995363442500
8 x 318.93995363442600
7 x 318.93995363442700
but there are actually 20 *different* values, which xlrd will dig out
for you, to the full precision that Excel uses to store them, not the
limited precision that Excel uses to display them.

> Now when xlrd reads such number it
> becomes 318.93995363442485. (I am doing the reading according to
> simple example in readme).

It doesn't "become" anything. Excel stores numbers in 64-bit IEEE 754
floating point format. xlrd reproduces them without change.

If you wish to round the results to Excel's notion of precision, you
can use the round() builtin function, or something like "%.15g" %
number

HTH,
John

Vaidotas Zemlys

unread,
Jan 31, 2008, 7:06:31 AM1/31/08
to python...@googlegroups.com

Le jeudi 31 janvier 2008 à 03:40 -0800, John Machin a écrit :

> Actually, the numbers have a lot of *bits* after the *binary* point.
> What are you using to display them as decimal numbers? AFAIK Excel
> will not display more than 15 decimal digits of precision.
>
> Example:
> Type 318.939953634425 into cell A1
> Type =A1+0.0000000000001 into cell A2
> Copy the formula into cells A3:A20
> Format cells A1:A20 to show 14 decimal places
>
> You will see
> 5 x 318.93995363442500
> 8 x 318.93995363442600
> 7 x 318.93995363442700
> but there are actually 20 *different* values, which xlrd will dig out
> for you, to the full precision that Excel uses to store them, not the
> limited precision that Excel uses to display them.
>
> > Now when xlrd reads such number it
> > becomes 318.93995363442485. (I am doing the reading according to
> > simple example in readme).
>
> It doesn't "become" anything. Excel stores numbers in 64-bit IEEE 754
> floating point format. xlrd reproduces them without change.
>
> If you wish to round the results to Excel's notion of precision, you
> can use the round() builtin function, or something like "%.15g" %
> number
>

Aaaaah! Thank you very much for the explanation. I did not know that
Excel shows only 15 digits of precision. I increased the number of
digits shown in Excel until appearance of zeroes and assumed that if
zeroes appear that is the end of the precision. As you pointed out that
is not true. I made a quick experiment created formula whose result is
1/3 and behold zeroes appear when there should be none. I checked
OpenoOffice.org 2.3 and Excel 2007, both give zeroes after 15 digits.
But if the actual value is stored in bigger precision, then this is a
bit misleading. Although not unexpected from Microsoft :)

Thank you very much for the answer, that was precisely what I needed.
And now I know that when you save the file under csv in Excel or
OpenOffice.org you lose precision, not that this is very important in
applications I use, but still very valuable information. Thanks again!

Vaidotas Zemlys


Kelly

unread,
Apr 5, 2011, 2:56:35 PM4/5/11
to python...@googlegroups.com
I understand that the full precision of a number is available in xlrd.  My question is whether or not there is a way to discover, via info available in xlrd, the precision with which a number was entered in the Excel file.  

My scenario is that a user enters numbers in Excel .. usually in a General format cell .. sometimes adjusting the precision displayed manually, but not always.  I need to read the files, do some manipulation, and output the data as text.  I need the number values in the output text to appear as they were displayed in the cell by Excel.  

I have a some data for few numbers I entered, read and output below.  It appears that Excel defaults to a 9 digit precision, but that can be changed by manually adjusting the precision via the toolbar (see Excel Cell Display for last number).   For the default entry / display, xlrd get a 'General' format ..no information about the precision on entry.  For the manually adjusted one .. it seems that the format_str reflects the precision displayed in the cell.  From the difference between the "%-25s" and "%-25.20f" data output it appears that Python "%s" formatting default maxes out at 15 digits .. or is something else going on? 

Excel Version             Excel 97-2003
xlrd version              xlrd-0.7.1-py2.7-win32.egg

Entered value               Excel Cell Display                "%-25s" % cell.value      "%-25.20f" % cell.value          xlrd cell format_str     
0.0000625                    0.0000625                            6.25e-05                         0.00006250000000000000    General                  
0.0009765625               0.000976563                        0.0009765625                 0.00097656250000000000    General                  
0.0006866455078125     0.000686646                       0.000686645507812         0.00068664550781250000    General                  
0.0006866455078125     0.0006866455078125000      0.000686645507812         0.00068664550781250000    0.0000000000000000000    

Any help or insights you could provide would be appreciated.
Thanks,
Kelly

John Machin

unread,
Apr 7, 2011, 6:13:15 PM4/7/11
to python...@googlegroups.com
On Wednesday, April 6, 2011 4:56:35 AM UTC+10, Kelly wrote:
I understand that the full precision of a number is available in xlrd.  My question is whether or not there is a way to discover, via info available in xlrd, the precision with which a number was entered in the Excel file.  

In general, this is not possible: (1) If the format is "General", you can't tell whether the user entered .1234 (4 digits of precision) or .1234000 (7 digits of precision) (2) Assuming that a non-General format was intended by the user to reflect the precision may not be correct.
 

My scenario is that a user enters numbers in Excel .. usually in a General format cell .. sometimes adjusting the precision displayed manually, but not always.  I need to read the files, do some manipulation, and output the data as text.  I need the number values in the output text to appear as they were displayed in the cell by Excel.  

... which may have no relation to the data-entry precision. Which do you want?

I have a some data for few numbers I entered, read and output below.  It appears that Excel defaults to a 9 digit precision,

That's a DISPLAY precision ...

 
but that can be changed by manually adjusting the precision via the toolbar (see Excel Cell Display for last number).   For the default entry / display, xlrd get a 'General' format ..no information about the precision on entry.

That's correct. Excel doesn't maintain that precision anywhere.

 
 For the manually adjusted one .. it seems that the format_str reflects the precision displayed in the cell.

Other way around: the displayed precision is a direct consequence of the format_str.

 
 From the difference between the "%-25s" and "%-25.20f" data output it appears that Python "%s" formatting default maxes out at 15 digits .. or is something else going on? 

"%s" % some_float produces exactly the same result as str(some_float), which doesn't use the full precision required to reproduce the original number. To do that use "%r" or repr():

    >>> x = 1/3.
    >>> str(x)
    '0.333333333333'
    >>> str(x).count('3')
    12
    >>> repr(x)
    '0.3333333333333333'
    >>> repr(x).count('3')
    16
    >>> float(str(x)) - x
    -3.33288951992472e-13
    >>> float(repr(x)) - x
    0.0

When you say "maxes out at 15 digits" you mean "15 fractional digits" which is not the same as "15 significant digits". str() does 12 of the latter
 

Excel Version             Excel 97-2003
xlrd version              xlrd-0.7.1-py2.7-win32.egg

Entered value               Excel Cell Display                "%-25s" % cell.value      "%-25.20f" % cell.value          xlrd cell format_str     
0.0000625                    0.0000625                            6.25e-05                         0.00006250000000000000    General                  
0.0009765625               0.000976563                        0.0009765625                 0.00097656250000000000    General                  
0.0006866455078125     0.000686646                       0.000686645507812         0.00068664550781250000    General                  
0.0006866455078125     0.0006866455078125000      0.000686645507812         0.00068664550781250000    0.0000000000000000000    

Seems to me that the "Excel Cell Display" is misleading in both row 3 and row 4.

 >>> y = 0.0006866455078125
 >>> str(y)
 '0.000686645507812'
 >>> repr(y)
 '0.0006866455078125'

Summary: input precision is lost, and saved precision and Excel display precision have only a loose relationship; what, precisely [pun intended], do you want?

Kelly

unread,
Apr 25, 2011, 12:14:45 AM4/25/11
to python...@googlegroups.com
Thank you for the clarifications .. to your question .. I think we want input precision w.r.t fractional digits in this case .. but maybe I've not thought this through properly.  

In this case, we are using Excel as a requirements specification / capture tool ... and then propagating entered / computed data out to a number of other documents that will be manually reviewed by human users who would expect the data values to all carry the same precision and look like the data entered / captured / computed in the Excel worksheet.   Note, if all values in a column / sheet carried the same, or nearly the same precision, we could just pick a standard representation for output.  But, the number of fractional digits in the data can range from 0 up to 20 .. at least in data we have been working with so far.

We have looked at changing all cell formats to Text (and locking the formatting down) which would allow us to capture the input precision via the text string .. this seems to work, but carries other ramifications .. especially w.r.t. in-worksheet formulas and computations we are doing on entered data.  Maybe a combination of Text cells and General cells with manually adjusted precision will be a good compromise .. will check that out.  Otherwise, the only other thoughts I had were to have the user define the precision they want for all entered values in an adjacent cell .. but that seems onerous, error-prone, and likely to be forgotten on updates.

Any further thoughts you have would be appreciated .. especially if I have missed the obvious for what seems like it would be a fairly standard use case.
Thanks,
Kelly


John Yeung

unread,
Apr 25, 2011, 1:19:06 AM4/25/11
to python...@googlegroups.com
On Mon, Apr 25, 2011 at 12:14 AM, Kelly <morrel...@gmail.com> wrote:
> But, the number of
> fractional digits in the data can range from 0 up to 20 .. at least in data
> we have been working with so far.

You're already losing precision then. Excel uses IEEE 754 floating
point for all its numeric data, which is what xlrd is reading:

<http://support.microsoft.com/kb/78113>

Unfortunately, I do not have any quick, easy suggestions to get the
precision you are after. (Python itself does support
arbitrary-precision decimals, but I don't know of a ready-made
Excel-like GUI that does.)

John Y.

John Machin

unread,
Apr 25, 2011, 3:02:23 AM4/25/11
to python...@googlegroups.com
On Monday, April 25, 2011 3:19:06 PM UTC+10, John Yeung wrote:
On Mon, Apr 25, 2011 at 12:14 AM, Kelly <morrel...@gmail.com> wrote:
> But, the number of
> fractional digits in the data can range from 0 up to 20 .. at least in data
> we have been working with so far.

You're already losing precision then.  Excel uses IEEE 754 floating
point for all its numeric data, which is what xlrd is reading:

  <http://support.microsoft.com/kb/78113>

It's worse. That article says that IEEE 754 restricts them to 15 decimal digits of precision. This is an imprecise statement. The IEEE 64-bit binary format provides for a mantissa of 53 significant bits, equivalent to 15.95 decimal digits of precision. What they actually do with it is to muck about trying to pretend that it is actually an exactly-15-decimal-digits system, a bit like Python's decimal.Decimal with a precision of 15 instead of 28. See section 2 of this: http://www.cs.berkeley.edu/~wkahan/Mindless.pdf

John Machin

unread,
Apr 25, 2011, 3:46:48 AM4/25/11
to python...@googlegroups.com
On Monday, April 25, 2011 2:14:45 PM UTC+10, Kelly wrote:

We have looked at changing all cell formats to Text (and locking the formatting down) which would allow us to capture the input precision via the text string .. this seems to work, but carries other ramifications .. especially w.r.t. in-worksheet formulas and computations we are doing on entered data.

What ramifications?
 
 Maybe a combination of Text cells and General cells with manually adjusted precision will be a good compromise .. will check that out.  Otherwise, the only other thoughts I had were to have the user define the precision they want for all entered values in an adjacent cell .. but that seems onerous, error-prone, and likely to be forgotten on updates.

Any further thoughts you have would be appreciated .. especially if I have missed the obvious for what seems like it would be a fairly standard use case.

Sorry, all I can offer is sympathy.

derek

unread,
Apr 26, 2011, 2:48:22 AM4/26/11
to python-excel
I don't think I agree that specialised data entry is a "standard case"
for Excel. I think the latter is great for arbitrary data
manipulation and data display. Data capture, checking and storage are
more in the domain of databases. Excel is a great tool; but its only
one tool, and trying to it extend to cover all possible cases will be
problematic. Maybe you can consider writing a small GUI with, say, a
SQLite backend, to handle the case you describe above.

Derek

Kelly

unread,
Apr 28, 2011, 11:33:29 AM4/28/11
to python-excel
Thank you all for your help .. the issues and possible solutions or
workarounds are beginning to gel

@John Yeung - pointing out loss of precision and reference to IEE 754

@John Machin - your early clarifications helped bring focus to my
problem and your latest reference to Mindless.pdf looks like it will
help us avoid some pitfalls go forward

@Derek - maybe you are right and this isn't a standard case .. I would
heave thought that whatever number a user entered into a worksheet (of
any arbitrary precision) should be recoverable and able to be
reproduced in text exactly as entered .. guess not

Thanks again.
Kelly

derek

unread,
Apr 29, 2011, 3:18:51 AM4/29/11
to python-excel
On Apr 28, 5:33 pm, Kelly <morrell.ke...@gmail.com> wrote:
> Thank you all for your help .. the issues and possible solutions or
> workarounds are beginning to gel
>
> @John Yeung - pointing out loss of precision and reference to IEE 754
>
> @John Machin - your early clarifications helped bring focus to my
> problem and your latest reference to Mindless.pdf looks like it will
> help us avoid some pitfalls go forward
>
> @Derek - maybe you are right and this isn't a standard case .. I would
> heave thought that whatever number a user entered into a worksheet (of
> any arbitrary precision) should be recoverable and able to be
> reproduced in text exactly as entered .. guess not

Kelly

Just for clarification; my point was not about the specific issue of
"recording precision" (as John says "Excel doesn't maintain that
precision anywhere" and you'll need to take up that with the
developers of Excel) but rather the more general principle of using
Excel as a data capture and recording "device" - except in the most
simplistic cases, I do not think that this is the appropriate tool to
use.

Derek

Kelly

unread,
Apr 29, 2011, 9:33:36 AM4/29/11
to python...@googlegroups.com
I think I agree with you .. We are probably pushing the limits of Excel .. We picked excel since it is already on everyone's desktop and we can easily control it using standard version control systems e.g. Clearcase

Do you have any alternative tools / approaches you could recommend?

Thanks, Kelly
Sent from my Verizon Wireless BlackBerry
--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To post to this group, send an email to python...@googlegroups.com.
To unsubscribe from this group, send email to python-excel...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/python-excel?hl=en-GB.

derek

unread,
Apr 30, 2011, 4:37:25 AM4/30/11
to python-excel
On Apr 29, 3:33 pm, "Kelly" <morrell.ke...@gmail.com> wrote:
> I think I agree with you .. We are probably pushing the limits of Excel .. We picked excel since it is already on everyone's desktop and we can easily control it using standard version control systems e.g. Clearcase
>
> Do you have any alternative  tools / approaches you could recommend?
>
> Thanks, Kelly

I am now biased towards web-based systems because of their "zero
deploy, instant update" aspect. For desktop work, PyQT and Python are
a good multi-platform option. For example, we are working on
VisTrails (www.vistrails.org), which is a pretty substantial package,
built using these tools.

John Yeung

unread,
Apr 30, 2011, 8:45:06 AM4/30/11
to python...@googlegroups.com
On Sat, Apr 30, 2011 at 4:37 AM, derek <game...@gmail.com> wrote:
> On Apr 29, 3:33 pm, "Kelly" <morrell.ke...@gmail.com> wrote:
>>
>> Do you have any alternative  tools / approaches you could recommend?
>>
>> Thanks, Kelly
>
> I am now biased towards web-based systems because of their "zero
> deploy, instant update" aspect.  For desktop work, PyQT and Python are
> a good multi-platform option.

At the risk of venturing even a bit further off the topic of
"python-excel", I would like to mention Pyjamas <http://pyjs.org/> for
those folks who are good with Python but not so good with JavaScript,
but still want to deploy a GUI application for the Web.

John Y.

Reply all
Reply to author
Forward
0 new messages