xlrd - 287100 reading improperly

120 views
Skip to first unread message

Jody

unread,
Sep 3, 2010, 3:35:22 PM9/3/10
to python-excel
I have a spreadsheet with about 15k rows and 2 columns. Col1 = index.
Col2 = some text. All of the numbers in Col1 are being read in with
an appended .0 at the end - which is mildly annoying, but possibly
related to the real problem. About 5 of the cells from column 1 are
not being read properly. The read value contains a colon typically,
instead of a 1, though out of order.

"287100" is being read in as "22870:0.0"
"281210" is being read in as "28120:.0"
"260100" is being read in as "2600:0.0"
"236210" is being read in as "23620:.0"

There's a bit of a pattern here, but I haven't the foggiest what could
be causing this. It's repeatable in brand new spreadsheets and gives
pause as my app needs to retain this ID correctly to serve any
function. Obviously 5 out of 15k isn't too bad, but nevertheless.
Anyone have any idea what's going on here?

I'm using this code to test. (same problem happens if you do
sheet.cell(cell,0).value though)

import xlrd
wb = open_workbook('test.xls')
sheet = wb.sheet_by_index(0)

cells = [0,1]
for cell in cells:
print str(sheet.cell_type(cell,0)) + str(sheet.cell_value(cell,0))


Chris Withers

unread,
Sep 4, 2010, 6:36:19 AM9/4/10
to python...@googlegroups.com
On 03/09/2010 20:35, Jody wrote:
> I have a spreadsheet with about 15k rows and 2 columns. Col1 = index.
> Col2 = some text. All of the numbers in Col1 are being read in with
> an appended .0 at the end - which is mildly annoying, but possibly
> related to the real problem. About 5 of the cells from column 1 are
> not being read properly. The read value contains a colon typically,
> instead of a 1, though out of order.
>
> "287100" is being read in as "22870:0.0"
> "281210" is being read in as "28120:.0"
> "260100" is being read in as "2600:0.0"
> "236210" is being read in as "23620:.0"

I'm afraid the above isn't very coherent an explanation of what data you
expect to be in these cells and why. Where does this sheet come from?
What leads you to have your expectations about the cell contents?

> function. Obviously 5 out of 15k isn't too bad, but nevertheless.

That's an alarming statement. Intermittent, unexplained failures would
be far more worrying that every single cell behaving in a particular
fashion. I suspect these cells are of a different type, but you've given
no indication of where the workbook originates from.

Can you provide a small workbook and *actual code* (not snippets) that
demonstrates the problem?

> I'm using this code to test. (same problem happens if you do
> sheet.cell(cell,0).value though)
>
> import xlrd
> wb = open_workbook('test.xls')
> sheet = wb.sheet_by_index(0)
>
> cells = [0,1]
> for cell in cells:
> print str(sheet.cell_type(cell,0)) + str(sheet.cell_value(cell,0))

This doesn't match with your quoted output above, it'd also raise an
exception if run. It's also a rather odd way of doing things...

What I would do is install xlutils, and then do:

from xlrd import open_workbook
from xlutils.display import cell_display


wb = open_workbook('test.xls')
sheet = wb.sheet_by_index(0)

for rowx in range(1):
print cell_display(sheet.cell(rowx,0)

...and see if you find the output enlightening.

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

Jody

unread,
Sep 4, 2010, 8:47:15 PM9/4/10
to python-excel
Thanks for your lengthy response. Oddly enough I can't recreate this
issue on my home computer (win7 64bit)- I was originally using my work
machine (winxp 32bit). Not sure what difference the environments
would make, but I'll follow up after a reboot and maybe try another
machine or something. Here are answers to your questions below in
case anyone else ever has this problem.

I'll try the xlutils on tuesday when i'm back at work.

The sheet is originally from an export of a sql table. However, I am
able to recreate the problem with a blank workbook with a single value
that matches one of the 4 i listed
> "287100"
> "281210"
> "260100"
> "236210"

What leads me to have my expectations of the content of the cells is
simply that is what I see if I open the workbook, i see the number
287100.... but when i use the library to read the number it says
2870:0.0 (sorry, i had typo'd an extra two in my original post) -- no
idea where the colon is coming from.

The cells are of a consistent type either number or text returns the
same problem.

My code was not a snippet. I created a tiny script from scratch and
ran it just for debugging purposes and was able to duplicate the
problem. I did forget the xlrd. before openworkbook. And sure, a
very odd way of doing it. But I'm new, and just trying to debug
here.

I was being facetious about the 5 out of 15k thing. Sorry sarcasm
doesn't translate well when written without smileys. Thanks again.

Jody

unread,
Sep 7, 2010, 11:42:20 AM9/7/10
to python-excel
The results of the slutils test...

number (287010.0000)
number (2870:0.0000)
text (287100)
text (thomas)
number (2870:0.0000)
number (28120:.0000)
number (2600:0.0000)
number (23620:.0000)
text (281210)
text (260100)
text (236210)

For a spreadsheet with data values as follows
287010
287100
287100
thomas
287100
281210
260100
236210
281210
260100
236210

The ones marked as text by xlutils I had converted to text by putting
an apostrophe (') in front. Just converting them to text by setting
the "format cells">>"Number" to "text" did nothing.

I've put an apostrophe at the beginning of all my id's and am
stripping it out after I retrieve it. This is an annoying, but
effective workaround.

On Sep 4, 6:36 am, Chris Withers <ch...@simplistix.co.uk> wrote:

Chris Withers

unread,
Sep 24, 2010, 6:29:32 AM9/24/10
to python...@googlegroups.com, Jody
On 05/09/2010 01:47, Jody wrote:
> The sheet is originally from an export of a sql table. However, I am
> able to recreate the problem with a blank workbook with a single value
> that matches one of the 4 i listed
>> "287100"
>> "281210"
>> "260100"
>> "236210"

It's not clear what you're trying to say here. Perhaps you could attach
said blank workbook so we can have a look?

> What leads me to have my expectations of the content of the cells is
> simply that is what I see if I open the workbook,

In what? Excel? With it's well known habbit of formatting data when you
open it rather than actually showing you what's in a cell?

> 287100.... but when i use the library to read the number it says
> 2870:0.0 (sorry, i had typo'd an extra two in my original post) -- no
> idea where the colon is coming from.

Again, copy'n'paste, don't type from memory. I still have no idea what
"2870:0.0" is or where you're getting it from...

Chris

Jody

unread,
Sep 24, 2010, 9:29:39 AM9/24/10
to python-excel
Thanks again for following up again Chris.

Sorry for the confusion. I've attached 2 files that should hopefully
help.
test excel colon problem.xls - small test spreadsheet with some
offending values, and some non offending values
test excel colon problem.py - test script I've been using to recreate
the issue

Yes, I'm familiar with excel storing data differently from how it is
displayed. But this is particularly odd. I'm getting the "2870:0.0"
simply from reading a cell in excel and then printing it to the
screen. XLRD seems to be the culprit - combined with some
environmental factor on my machine here as i'm only seeing this issue
on one of my 2 machines (unfortunately, it's the one i need to use
most often). I have no idea how it's coming up with that value with
the colon it, which is what brought me here. :)

Results of the test script on my machine (again, I only see this
problem on my work computer, not my home machine):

2287010.0
22870:0.0
1287100
1thomas
22870:0.0
228120:.0
22600:0.0
223620:.0
1281210
1260100
1236210

John Machin

unread,
Sep 26, 2010, 5:19:44 AM9/26/10
to python...@googlegroups.com
On 24/09/2010 11:29 PM, Jody wrote:
> Thanks again for following up again Chris.
>
> Sorry for the confusion. I've attached 2 files that should hopefully
> help.
> test excel colon problem.xls - small test spreadsheet with some
> offending values, and some non offending values
> test excel colon problem.py - test script I've been using to recreate
> the issue
>
> Yes, I'm familiar with excel storing data differently from how it is
> displayed. But this is particularly odd. I'm getting the "2870:0.0"
> simply from reading a cell in excel and then printing it to the
> screen. XLRD seems to be the culprit - combined with some
> environmental factor on my machine here as i'm only seeing this issue
> on one of my 2 machines (unfortunately, it's the one i need to use
> most often). I have no idea how it's coming up with that value with
> the colon it, which is what brought me here. :)
>
> Results of the test script on my machine (again, I only see this
> problem on my work computer, not my home machine):
>
> 2287010.0

It's interesting that the first line doesn't have a problem:
287010 -> OK
281210 -> bad
236210 -> bad

> 22870:0.0
> 1287100
> 1thomas
> 22870:0.0
> 228120:.0
> 22600:0.0
> 223620:.0
> 1281210
> 1260100
> 1236210

Please save the following code as (say) Jody_colon.py and run it from
the command prompt specifying the name of your file (wrapped in quotes
if you persist in having spaces in the name).

=== start of Jody_colon.py ===
import sys
print 'sys.version:', sys.version
print 'sys.stdout.encoding:', sys.stdout.encoding

import locale
print 'locale.getdefaultlocale:', locale.getdefaultlocale()

import platform
print 'platform.platform:', platform.platform()

print

# Test that str and repr work OK with the suspect numbers
for x in [287010.0, 287100.0, 281210.0, 260100.0, 236210.0]:
print x, str(x), repr(x)
print


import xlrd
print 'xlrd.__VERSION__', xlrd.__VERSION__

wb = xlrd.open_workbook(sys.argv[1])
sheet = wb.sheet_by_index(0)

for rowx in xrange(sheet.nrows):
value = sheet.cell_value(rowx, 0)
print ("row# %d, ty %d, str<%s>, repr<%r>"
% (rowx + 1, sheet.cell_type(rowx,0), value, value)
)
=== end of Jody_colon.py ===

When I run this, I get the following on my computer:

=== start of output ===
C:\wherever\Jody_colon>\python27\python Jody_colon.py Jody_colon.xls
sys.version: 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit
(Intel)]
sys.stdout.encoding: cp850
locale.getdefaultlocale: ('en_AU', 'cp1252')
platform.platform: Windows-XP-5.1.2600-SP3

287010.0 287010.0 287010.0
287100.0 287100.0 287100.0
281210.0 281210.0 281210.0
260100.0 260100.0 260100.0
236210.0 236210.0 236210.0

xlrd.__VERSION__ 0.7.1
row# 1, ty 2, str<287010.0>, repr<287010.0>
row# 2, ty 2, str<287100.0>, repr<287100.0>
row# 3, ty 1, str<287100>, repr<u'287100'>
row# 4, ty 1, str<thomas>, repr<u'thomas'>
row# 5, ty 2, str<287100.0>, repr<287100.0>
row# 6, ty 2, str<281210.0>, repr<281210.0>
row# 7, ty 2, str<260100.0>, repr<260100.0>
row# 8, ty 2, str<236210.0>, repr<236210.0>
row# 9, ty 1, str<281210>, repr<u'281210'>
row# 10, ty 1, str<260100>, repr<u'260100'>
row# 11, ty 1, str<236210>, repr<u'236210'>
=== end of output ===

Essentially the same output from Python 2.X for 3 <= X <= 7. In short, I
can't reproduce your problem. Please show the output (copy/paste, don't
edit/retype) that you get on both your home computer and your work computer.

Jody

unread,
Sep 27, 2010, 2:47:45 PM9/27/10
to python-excel
WORK computer (with the problem).

C:\Source>Jody_colon.py "C:\Source\excelpractice\test excel colon
problem.xls"
sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) [MSC v.1500 32 bit
(Intel)]
sys.stdout.encoding: cp437
locale.getdefaultlocale: ('en_US', 'cp1252')
platform.platform: Windows-XP-5.1.2600-SP3

287010.0 287010.0 287010.0
287100.0 287100.0 287100.0
281210.0 281210.0 281210.0
260100.0 260100.0 260100.0
236210.0 236210.0 236210.0

xlrd.__VERSION__ 0.7.1
row# 1, ty 2, str<287010.0>, repr<287010.0>
row# 2, ty 2, str<287100.0>, repr<287100.0>
row# 3, ty 1, str<287100>, repr<u'287100'>
row# 4, ty 1, str<thomas>, repr<u'thomas'>
row# 5, ty 2, str<287100.0>, repr<287100.0>
row# 6, ty 2, str<281210.0>, repr<281210.0>
row# 7, ty 2, str<260100.0>, repr<260100.0>
row# 8, ty 2, str<236210.0>, repr<236210.0>
row# 9, ty 1, str<281210>, repr<u'281210'>
row# 10, ty 1, str<260100>, repr<u'260100'>
row# 11, ty 1, str<236210>, repr<u'236210'>

Home computer coming tonight when I get home.

John Machin

unread,
Sep 27, 2010, 6:04:16 PM9/27/10
to python...@googlegroups.com
On 28/09/2010 4:47 AM, Jody wrote:
> WORK computer (with the problem).

WHAT PROBLEM? Your output below is identical to mine (apart from the
minor locale/encoding differences. I see no colon here.

John Machin

unread,
Sep 27, 2010, 7:22:49 PM9/27/10
to python...@googlegroups.com
On 28/09/2010 4:47 AM, Jody wrote:
> WORK computer (with the problem).
>
> C:\Source>Jody_colon.py "C:\Source\excelpractice\test excel colon
> problem.xls"
> sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) [MSC v.1500 32 bit
> (Intel)]


> On Sep 26, 5:19 am, John Machin<sjmac...@lexicon.net> wrote:
>>
>> When I run this, I get the following on my computer:
>>
>> === start of output ===
>> C:\wherever\Jody_colon>\python27\python Jody_colon.py Jody_colon.xls
>> sys.version: 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit
>> (Intel)]

HMMMM:

Jody (work): sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) etc
John : sys.version: 2.7 (r27:82525, Jul 4 2010, 09:01:59) etc

I have just now downloaded the Windows x86 msi from www.python.org and
reinstalled it on another computer. It gives the same result as on my
primary computer (above).

Looks whacked: lower revision number, later date.

Jody

unread,
Sep 27, 2010, 11:08:15 PM9/27/10
to python-excel
Very odd indeed. Perhaps a print without str() or repr(). I'll try
that and post tomorrow. Plus a reinstall. Thanks again!


Here is home computer for the sake of completeness.

C:\Users\Jody\Documents>jody_colon.py "C:\Users\Jody\Documents\test
excel colon
problem.xls"
sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) [MSC v.1500 32 bit
(Intel)]
sys.stdout.encoding: cp437
locale.getdefaultlocale: ('en_US', 'cp1252')
platform.platform: Windows-7-6.1.7600

287010.0 287010.0 287010.0
287100.0 287100.0 287100.0
281210.0 281210.0 281210.0
260100.0 260100.0 260100.0
236210.0 236210.0 236210.0

xlrd.__VERSION__ 0.7.1
row# 1, ty 2, str<287010.0>, repr<287010.0>
row# 2, ty 2, str<287100.0>, repr<287100.0>
row# 3, ty 1, str<287100>, repr<u'287100'>
row# 4, ty 1, str<thomas>, repr<u'thomas'>
row# 5, ty 2, str<287100.0>, repr<287100.0>
row# 6, ty 2, str<281210.0>, repr<281210.0>
row# 7, ty 2, str<260100.0>, repr<260100.0>
row# 8, ty 2, str<236210.0>, repr<236210.0>
row# 9, ty 1, str<281210>, repr<u'281210'>
row# 10, ty 1, str<260100>, repr<u'260100'>
row# 11, ty 1, str<236210>, repr<u'236210'>

John Machin

unread,
Sep 28, 2010, 1:05:50 AM9/28/10
to python...@googlegroups.com
On 28/09/2010 1:08 PM, Jody wrote:
> Very odd indeed. Perhaps a print without str() or repr().

AFAIK print x will call x.__str__ if it exists otherwise x.__repr__ if
it exists otherwise it will fake it e.g. '<__main__.Bar object at
0x00D54890>'

>>
>> Jody (work): sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) etc
>> John : sys.version: 2.7 (r27:82525, Jul 4 2010, 09:01:59) etc
>>
>> I have just now downloaded the Windows x86 msi fromwww.python.organd
>> reinstalled it on another computer. It gives the same result as on my
>> primary computer (above).
>>
>> Looks whacked: lower revision number, later date.

My correspondents tell me that it appears that you are using an
ActiveState distribution instead of the python.org distribution, and
that this shouldn't make any difference.


Jody

unread,
Sep 28, 2010, 10:14:23 AM9/28/10
to python-excel
Yes, that is the case - I am using the ActiveState distribution. The
other thing I failed to mention is that I'm using PyScripter for my
IDE. Why in the world would that matter? Well, because it's the only
one giving me bad data - ok, so I'm still not sure why that matters.
Running these scripts from the command line give me good results;
WinPython gives me good results, and so does IDLE. PyScripter isn't
even print the raw numbers correctly. (for x in [287010.0, 287100.0,
281210.0, 260100.0, 236210.0]: print x)

So, I guess I will be taking up my issues with them (or switch
IDE's). Thank you gentlemen very much. You've been very helpful!

Jody

unread,
Sep 28, 2010, 1:15:57 PM9/28/10
to python-excel
This is confirmed as an issue with Pyscripter. See the thread here
with a workaround:

http://groups.google.com/group/pyscripter/browse_thread/thread/7514d70b9ab3ff63

Георги Георгиев

unread,
Sep 29, 2010, 6:08:50 AM9/29/10
to python...@googlegroups.com
side note: i use eclipse with pydev plugin and i am very happy with
it :)

--
Georgi Georgiev
DIRECT SERVICES Ltd.
tel: +359-2-9609754
+359-2-9609737
fax: +359-2-9609738
www.directservices.bg

Reply all
Reply to author
Forward
0 new messages