type conversion problem

2,447 views
Skip to first unread message

Rob van Harrevelt

unread,
Jan 20, 2011, 6:26:06 AM1/20/11
to openpyxl-users
Hello,

When a cell with a number is read, the cell sometimes gets data_type
's' (text). This problem occurs when the xlsx file has been written
with openpyxl and when the cell is displayed with scientific format.
The type conversion problem does not occur when the xlsx file has been
created manually in Excel.

The problem can be demonstrated with the following Python script:

#!/usr/bin/python
from openpyxl.reader.excel import load_workbook
from openpyxl.reader.excel import load_workbook
from openpyxl.workbook import Workbook
from openpyxl.writer.excel import save_workbook

XLSXFILE = "test1.xlsx"

# create xlsx file
book = Workbook()
cell = book.get_active_sheet().cell('A1')
cell.value = 0.12e-30
print 'cell value written:'
print cell.value
print type(cell.value)
print cell.data_type
save_workbook(book, XLSXFILE)

print ' '

# read xlsx file
book = load_workbook(XLSXFILE)
cell = book.get_active_sheet().cell('A1')
print 'cell value read:'
print cell.value
print type(cell.value)
print cell.data_type

The output of this script is:

cell value written:
1.2e-31
<type 'float'>
n

cell value read:
1.2e-31
<type 'unicode'>
s

Is there a way to avoid this problem or is it a problem with openpyxl?

Thank you in advance,

Rob van Harrevelt

Eric Gazoni

unread,
Jan 25, 2011, 3:37:54 AM1/25/11
to openpyxl-users
Hi Rob,
you spotted a tiny bug indeed, now fixed in version 1.3.1 on the
repository.

https://bitbucket.org/ericgazoni/openpyxl/changeset/19b7ea3262d3

Cheers,
Eric

Rob van Harrevelt

unread,
Jan 25, 2011, 12:01:03 PM1/25/11
to openpyxl-users
Hi Eric,

Thank you for your reply, I tried the new version and it works
correctly.

I found two other problems.

When a number is written to an xlsx file with openpyxl, then the
numbers are not stored with all decimals. For example, when I write a
value 1.23456789123456 to an xlsx file, then the value in the xlsx
file is 1.23456789123. This is because in write_worksheet_data in
writer/worksheet.py the numbers are converted to a string using the
'%s' % value statement. I think that the '%s' % value should be
replaced with repl(value).

Another problem occurs when a cell has a number formatted as a text,
which can be achieved by typing for example '1.2 (an apostrophe
before the 1) in a cell. When this cell is read, then the data type is
number and not text. Of course, it is quite unusual to format a number
as a text.

Cheers,
Rob

did...@gmail.com

unread,
Nov 29, 2012, 7:24:28 PM11/29/12
to openpyx...@googlegroups.com, rvanha...@gmail.com
Actually, I have this exact problem. An excel file I'm given has values in it like 3E5 (these are model numbers). The cells within Excel display properly, and are formatted as text, but using the "Format cell" power, rather than the single leading quote power. When I read them in with openpyxl, they are all converting to numbers, so my output becomes 300000 instead of 3E5.

I'm still looking for an answer to this...

--Danny

Charlie Clark

unread,
Nov 30, 2012, 7:20:59 AM11/30/12
to openpyx...@googlegroups.com
Am 30.11.2012, 01:24 Uhr, schrieb <did...@gmail.com>:

> Actually, I have this exact problem. An excel file I'm given has values
> in
> it like 3E5 (these are model numbers). The cells within Excel display
> properly, and are formatted as text, but using the "Format cell" power,
> rather than the single leading quote power. When I read them in with
> openpyxl, they are all converting to numbers, so my output becomes 300000
> instead of 3E5.

Unfortunately it looks like the behaviour looks is exactly as expected.
From the cell tests:

def test_numeric(self):

def check_numeric(value):
self.cell.value = value
eq_(self.cell.TYPE_NUMERIC, self.cell.data_type)

values = (42, '4.2', '-42.000', '0', 0, 0.0001, '0.9999',
'99E-02', 1e1, '4', '-1E3', 4, decimal.Decimal('3.14'))
for value in values:
yield check_numeric, value


This is handled by the Cell.value property which always checks the type
when setting runs the regex on strings which will identifies strings with
"e" or "E" surrounded numbers as numbers.

A workaround is to assign directly to cell._value and cell._data_to 's',
but I think it should be possible to set the type in the constructor and
maybe disable type-checking if a type is manually set. Probably need a
dedicated set_typed_value(self, value, data_type) Or maybe simply infer
only if a value does not conform to the existing data_type or is None.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Charlie Clark

unread,
Nov 30, 2012, 7:28:01 AM11/30/12
to openpyx...@googlegroups.com
Am 30.11.2012, 13:20 Uhr, schrieb Charlie Clark
<charli...@clark-consulting.eu>:

> A workaround is to assign directly to cell._value and cell._data_to 's',
> but I think it should be possible to set the type in the constructor and
> maybe disable type-checking if a type is manually set. Probably need a
> dedicated set_typed_value(self, value, data_type) Or maybe simply infer
> only if a value does not conform to the existing data_type or is None.

Doh!

def set_value_explicit(self, value=None, data_type=TYPE_STRING):
"""Coerce values according to their explicit type"""

already exists. That should be your solution:

c.set_value_explicit('3e5')
c.value
u'3e5'
Reply all
Reply to author
Forward
0 new messages