xlrd represents numbers with 0 decimal places as floats

2,483 views
Skip to first unread message

Jacob Fenwick

unread,
Oct 17, 2009, 5:21:59 PM10/17/09
to python-excel
Hi,

I'm a new user of xlrd.

I was just noticing that any field in Excel represented as number will be interpreted by xlrd as a python float then has one decimal point. For example 4 would be represented as 4.0.

I realize this is easy to fix in any python code by converting to an int and then converting to a str, but I'm wondering if there's a way to change how this works in xlrd. Perhaps a number with 0 decimal places should return an int, or at least it should do the conversion to an int so the user doesn't have to.

Jacob

John Machin

unread,
Oct 17, 2009, 7:37:27 PM10/17/09
to python...@googlegroups.com
On 18/10/2009 8:21 AM, Jacob Fenwick wrote:

> I'm a new user of xlrd.

Welcome ...

> I was just noticing that any field in Excel represented as number will
> be interpreted by xlrd as a python float then has one decimal point. For
> example 4 would be represented as 4.0.

As far as Excel's calculations are concerned, there is no distinction
between a float with zero fractional part and an integer. Excel does
save space in the XLS file where a number can be represented exactly as
a twos-complement 30-bit integer, but this is not mandatory and on
re-reading by Excel so-represented numbers are converted to float and
their history is forgotten.

"one decimal point"? Did you mean "one decimal place"? Floats don't have
decimal places. Excel's 4 is represented as an IEEE 794 64-bit float,
not "4.0".

> I realize this is easy to fix in any python code by converting to an int
> and then converting to a str,

What is the problem that you say needs fixing?

> but I'm wondering if there's a way to
> change how this works in xlrd. Perhaps a number with 0 decimal places
> should return an int, or at least it should do the conversion to an int
> so the user doesn't have to.

You write as though "return an int" and "do the conversion to an int"
are different -- please explain.

Consider a worksheet that is valuing your stock exchange investments. It
will have columns for quantity, price, and market value. Yes, all of the
quantity numbers are expected to be integers. However the price and MV
columns contain floats, some of which may have a zero fractional part.
Do you really want a display treatment for a number that's determined
programatically without inspecting its neighbours?

HTH,
John


Jacob Fenwick

unread,
Oct 17, 2009, 8:38:49 PM10/17/09
to python...@googlegroups.com
I'll start by saying sorry about some of the mistakes I had in my last email, it was written hastily.


On Sat, Oct 17, 2009 at 7:37 PM, John Machin <sjma...@lexicon.net> wrote:
As far as Excel's calculations are concerned, there is no distinction
between a float with zero fractional part and an integer. Excel does
save space in the XLS file where a number can be represented exactly as
a twos-complement 30-bit integer, but this is not mandatory and on
re-reading by Excel so-represented numbers are converted to float and
their history is forgotten.

 
Now that you've explained that Excel actually stores all numbers as floats, that gives me a better understanding of why this is happening.

> I realize this is easy to fix in any python code by converting to an int
> and then converting to a str,

What is the problem that you say needs fixing?

The problem I'm referring to is the discrepancy between how Excel represents numbers to the user and how I was seeing Python represent them after parsing them using xlrd. However, I now understand this is just how Excel stores numbers.

But even if that's how Excel stores numbers, there must be some place that it's storing that metadata about how many decimal places a field should represent to the user. Is that information easily accessible?
 
Consider a worksheet that is valuing your stock exchange investments. It
will have columns for quantity, price, and market value. Yes, all of the
quantity numbers are expected to be integers. However the price and MV
columns contain floats, some of which may have a zero fractional part.
Do you really want a display treatment for a number that's determined
programatically without inspecting its neighbours?

I understand that a quantity should be an integer, while a price could be a dollar amount represented as a float, but I'm not sure about what market value is, and I'm not sure how the three would be related, so I'm not sure I understand the point of your example.

Thanks for your quick reply,

Jacob

John Machin

unread,
Oct 17, 2009, 10:25:37 PM10/17/09
to python...@googlegroups.com
On 18/10/2009 11:38 AM, Jacob Fenwick wrote:
> I'll start by saying sorry about some of the mistakes I had in my last
> email, it was written hastily.
>
> On Sat, Oct 17, 2009 at 7:37 PM, John Machin <sjma...@lexicon.net
> <mailto:sjma...@lexicon.net>> wrote:
>
> As far as Excel's calculations are concerned, there is no distinction
> between a float with zero fractional part and an integer. Excel does
> save space in the XLS file where a number can be represented exactly as
> a twos-complement 30-bit integer, but this is not mandatory and on
> re-reading by Excel so-represented numbers are converted to float and
> their history is forgotten.
>
>
> Now that you've explained that Excel actually stores all numbers as
> floats, that gives me a better understanding of why this is happening.
>
> > I realize this is easy to fix in any python code by converting to
> an int
> > and then converting to a str,
>
> What is the problem that you say needs fixing?
>
>
> The problem I'm referring to is the discrepancy between how Excel
> represents numbers to the user and how I was seeing Python represent
> them after parsing them using xlrd. However, I now understand this is
> just how Excel stores numbers.
>
> But even if that's how Excel stores numbers, there must be some place
> that it's storing that metadata about how many decimal places a field
> should represent to the user. Is that information easily accessible?

It is storing a "number format" string which corresponds to what you see
when you do Format/Cells/Number in the Excel UI. That can be retrieved
by opening the workbook with formatting_info=True and following this
path from a cell's xf_index:

xf = Book.xf_list[xf_index]
fmt_key = xf.format_key
fmt_obj = Book.format_map[fmt_key]
fmt_str = fmt_obj.format_str

xlrd analyses these format strings only as far as attempting to guess if
the number in a cell is intended to represent a date. Nobody has
written a general-purpose format string parser.

> Consider a worksheet that is valuing your stock exchange investments. It
> will have columns for quantity, price, and market value. Yes, all of the
> quantity numbers are expected to be integers. However the price and MV
> columns contain floats, some of which may have a zero fractional part.
> Do you really want a display treatment for a number that's determined
> programatically without inspecting its neighbours?
>
>
> I understand that a quantity should be an integer, while a price could
> be a dollar amount represented as a float, but I'm not sure about what
> market value is, and I'm not sure how the three would be related, so I'm
> not sure I understand the point of your example.

quantity * price -> market_value

123 * 4.56 -> 560.88

123 * 1.00 -> 123.00
123 * 1 -> 123 # you don't want to display it like this

124 * 1.25 -> 155.00
124 * 1.25 -> 155 # you don't want to display it like this

The point is that displaying a whole-number float as an integer
unconditionally is not a good idea.

Cheers,
John

Chris Withers

unread,
Oct 20, 2009, 10:42:52 AM10/20/09
to python...@googlegroups.com
John Machin wrote:
> when you do Format/Cells/Number in the Excel UI. That can be retrieved
> by opening the workbook with formatting_info=True and following this
> path from a cell's xf_index:
>
> xf = Book.xf_list[xf_index]
> fmt_key = xf.format_key
> fmt_obj = Book.format_map[fmt_key]
> fmt_str = fmt_obj.format_str

Given the shim-ish nature of Cell objects already, any chance they could
grow a:

@property
def xf(self):
return self.sheet.book.xf_list[self.xf_index]

(I know the .sheet.book bit is hand waving ;-) )

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
Oct 20, 2009, 6:57:02 PM10/20/09
to python...@googlegroups.com
On 21/10/2009 1:42 AM, Chris Withers wrote:
> John Machin wrote:
>> when you do Format/Cells/Number in the Excel UI. That can be retrieved
>> by opening the workbook with formatting_info=True and following this
>> path from a cell's xf_index:
>>
>> xf = Book.xf_list[xf_index]
>> fmt_key = xf.format_key
>> fmt_obj = Book.format_map[fmt_key]
>> fmt_str = fmt_obj.format_str
>
> Given the shim-ish nature of Cell objects already, any chance they could
> grow a:
>
> @property
> def xf(self):
> return self.sheet.book.xf_list[self.xf_index]
>
> (I know the .sheet.book bit is hand waving ;-) )

and thus can't be implemented as described.


Chris Withers

unread,
Oct 27, 2009, 12:04:58 PM10/27/09
to python...@googlegroups.com

How come? Even if the cell doesn't store a reference back to the sheet
it came from, it can't be hard to add ;-)

Reply all
Reply to author
Forward
0 new messages