Re: [pyxl] IronPython / xlrd Encoding Problems

862 views
Skip to first unread message

Chris Withers

unread,
May 16, 2013, 2:09:56 PM5/16/13
to python...@googlegroups.com, Superquant
On 16/05/2013 02:50, Superquant wrote:
> I have run into some troubles trying to use xlrd in IronPython 2.7 -
> even the most basic open_workbook operation fails:
>
> | workbook = xlrd.open_workbook(xlsfile)
>
> File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\xlrd\__init__.py", |line 426, in open_workbook
>
> | TypeError: sequence item 0: expected bytes or byte array, str found
>
> The same file and syntax open correctly running in pure Python 2.7
>
> I found through some research that IronPython uses unicode for the str implementation and I imagine this may cause some problems (same as Python 3). The xlrd page however states that the library supports unicode and Python 3, so not clear what the problem is.|

What version of xlrd are you using?

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
May 16, 2013, 6:45:48 PM5/16/13
to python-excel


On May 16, 11:50 am, Superquant <gmolin...@gmail.com> wrote:
> I have run into some troubles trying to use xlrd in IronPython 2.7 - even
> the most basic open_workbook operation fails:
>
>  workbook = xlrd.open_workbook(xlsfile)
>
>   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\xlrd\__init__.py", line 426, in open_workbook
>
>   TypeError: sequence item 0: expected bytes or byte array, str found
> The same file and syntax open correctly running in pure Python 2.7

This is what I get:

IronPython 2.7.2.1 (2.7.0.40) on .NET 4.0.30319.296 (32-bit)
Type "help", "copyright", "credits" or "license" for more information.
>>> import xlrd
>>> xlrd.__VERSION__
'0.9.3dev'
>>> b = xlrd.open_workbook('xirr_demo.xls')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\IronPython 2.7\lib\site-packages\xlrd
\__init__.py", lin
e 426, in open_workbook
TypeError: can only join an iterable of bytes

Different error message, same location. However it didn't fail with
0.8.0. Please replicate that so we can see what version of IronPython
and xlrd you are using.

Note that the statement at line 426 is merely a call to
Book.open_workbook_xls ... this has nothing to do with str/unicode/
bytes i.e. for some reason the full stack trace is not being displayed
-- makes debugging just a little harder.

>
> I found through some research that IronPython uses unicode for the str implementation and I imagine this may cause some problems (same as Python 3). The xlrd page however states that the library supports unicode and Python 3, so not clear what the problem is.

the comparison str is unicode returns False for py2, fails (unicode
not defined) for py3, and returns True for ipy ===> minefield.
Also last time I looked the ipy ElementTree implementation was borked
===> xlrd can't support xlsx files.

>
> Any suggestions or experience? I need to use IP for other native library integration issues.
Have you considered calling the native excel-file-reading library?

Superquant

unread,
May 16, 2013, 10:07:14 PM5/16/13
to python...@googlegroups.com
I am running xlrd 0.9.2. iPython 2.7.3.

I ended up trying the native COM excel library but it was so painful to work with that I just went with using a CSV for now export from xls. Would prefer to use xlrd of course .. sounds like I could try xlrd 0.8? I just need basic read access of rows and column ranges/cells

Superquant

unread,
May 16, 2013, 10:15:06 PM5/16/13
to python...@googlegroups.com
I downgraded to xlrd 0.8 and run and got a different error than 0.9.2:

File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\xlrd\__init__.py", line 426, in open_workbook
TypeError: can only join an iterable of bytes
Process finished with exit code 1


On Thursday, May 16, 2013 6:45:48 PM UTC-4, John Machin wrote:

LuizLima

unread,
Aug 4, 2013, 9:39:14 PM8/4/13
to python...@googlegroups.com
Hi Superquant (and also John / Chris)

Sorry for bringing up this post again, but I am facing exactly the same problem you had, and can't find a solution anywhere. Have you got it to work?

In summary, I was trying xlrd 0.9.2 w/ IronPython, and getting "expected str, got bytes" error. Downgraded to 0.8.0, and error changed to "can only join an iterable of bytes". On CPython, no errors are shown and the code runs just fine.

Thank you,

Luiz

P.S.: If I am offending a forum rule by responding in this post, I can start a new one.

Michael Overberg

unread,
Nov 1, 2013, 2:39:01 AM11/1/13
to python...@googlegroups.com

Hi there,

I'm a newbie to Python, but have to use it because it is implemented as scripting language
in Codesys V3.5.

Therefore first many thanks to all the guys who are working or worked on this package,
it`s biliant and easy.

I'm working with CPython/GTK and IronPhyton on WinXP or Win7.
That are two Python's to run one at the scripting language and the other on win-systems
without Codesys, to use one code for both systems.

Now I have got 1 problem:
If I parse an Excel-file with CPython everything seems OK.
But if I parse an Excel-file with IronPython the type of the cells makes problems.
Like you can see in the description below the print output of the CPython gives me
the right type "unicode".
But the print output of the IronPython gives me the wrong type "str". But the encoding
seems to work because I can see the right chars of my german language.

PS:
I have fixed some issues I found while I tried to find out why that output of
IronPython does not fit or tracebacks comes up.
Feel free to use this informations.


Any suggestions or experience?





Versions:
############################
xlrd 0.9.2
Python 2.7.5
IronPython 2.7.3 (2.7.0.40)

Excel file ( Excel2010 saved as 2003 ):
############################
Sheet: ELCADExport
Data of columns A to B and row 1 to 6
Test,Michael
1.0,äöü
2.0,ß
3.0,µ
4.0,?
5.0,HALLO

made following changes:
#######################

xlrd\compdoc.py ( nearby row 330 ):
###########
            if todo != 0:
                fprintf(self.logfile, 
                    "WARNING *** OLE2 stream %r: expected size %d, actual size %d\n",
                    name, size, size - todo)

        # Michael 
        # changed the return line to let this module work in IronPython 2.7
# I think .join makes different results in .net
        # old line at row 335 = 
        # return b''.join(sectors)
        # my solution:        
        f = b''
        for i in range(0,len(sectors)):
            f = f + sectors[i]
        return f
        

    def _dir_search(self, path, storage_DID=0):

xlrd\__init__.py ( nearby row 323 ):
( didn`t change anything of the output !!!!!!!!! )
###########
from .xldate import XLDateError, xldate_as_tuple

# Michael 
# changed the import of encodings work in IronPython 2.7
# sys.version starts not with IronPython it starts with Versioninformations
# old line at row 325 = 
# if sys.version.startswith("IronPython"):
# my solution:        
if "IronPython" in sys.version:
    # print >> sys.stderr, "...importing encodings"
    import encodings

try:

xlrd\book.py ( nearby row 15 ):
( didn`t change anything of the output !!!!!!!!! )
###########
from . import formatting

# Michael 
# changed the import of encodings work in IronPython 2.7
# sys.version starts not with IronPython it starts with Versioninformations
# old line at row 16 = 
# if sys.version.startswith("IronPython"):
# my solution:        
if "IronPython" in sys.version:
    # print >> sys.stderr, "...importing encodings"
    import encodings

empty_cell = sheet.empty_cell # for exposure to the world ...



Outputs of both Pythons:
############################

CPython:
############################

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, r'Z:\Daten\Uebungen_Michael\Uebungen Pygtk2\UtilityExcel\
UtilitiesExcel\UtilitiesExcel\resources\Excel')
>>> from xlrd import open_workbook
>>> wb = open_workbook( filename = r'C:\Temp\test.xls' )
>>> encoding = wb.encoding
>>> print ( encoding  )
utf_16_le
>>> for s in wb.sheets():
...     print type(s.name)
...     if type(s.name) == 'unicode':
...         print 'Sheet:',s.name.encode(encoding )
...     else:
...         print 'Sheet:',s.name
...     for row in range(s.nrows):#(s.nrows): (0,120):
...         values = []
...         for col in range(s.ncols):
...             values.append(s.cell(row,col).value)
...         print values
...         print type(values[0])
...         if type(values[0]) == 'unicode':
...             print values[0].encode(encoding )
...         else:
...             print values[0]
...         print type(values[1])
...         if type(values[1]) == 'unicode':
...             print values[1].encode(encoding )
...         else:
...             print values[1]
...
<type 'unicode'>
Sheet: ELCADExport
[u'Test', u'Michael']
<type 'unicode'>
Test
<type 'unicode'>
Michael
[1.0, u'\xe4\xf6\xfc']
<type 'float'>
1.0
<type 'unicode'>
äöü
[2.0, u'\xdf']
<type 'float'>
2.0
<type 'unicode'>
ß
[3.0, u'\xb5']
<type 'float'>
3.0
<type 'unicode'>
µ
[4.0, u'?']
<type 'float'>
4.0
<type 'unicode'>
?
[5.0, u'HALLO']
<type 'float'>
5.0
<type 'unicode'>
HALLO
>>>


IronPython:
############################
IronPython 2.7.3 (2.7.0.40) on .NET 4.0.30319.1 (32-bit)
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, r'Z:\Daten\Uebungen_Michael\Uebungen Pygtk2\UtilityExcel\
UtilitiesExcel\UtilitiesExcel\resources\Excel')
>>> from xlrd import open_workbook
>>> wb = open_workbook( filename = r'C:\Temp\test.xls' )
>>> encoding = wb.encoding
>>> print ( encoding  )
utf_16_le
>>> for s in wb.sheets():
...     print type(s.name)
...     if type(s.name) == 'unicode':
...         print 'Sheet:',s.name.encode(encoding )
...     else:
...         print 'Sheet:',s.name
...     for row in range(s.nrows):#(s.nrows): (0,120):
...         values = []
...         for col in range(s.ncols):
...             values.append(s.cell(row,col).value)
...         print values
...         print type(values[0])
...         if type(values[0]) == 'unicode':
...             print values[0].encode(encoding )
...         else:
...             print values[0]
...         print type(values[1])
...         if type(values[1]) == 'unicode':
...             print values[1].encode(encoding )
...         else:
...             print values[1]
...
<type 'str'>
Sheet: ELCADExport
['Test', 'Michael']
<type 'str'>
Test
<type 'str'>
Michael
[1.0, u'\xe4\xf6\xfc']
<type 'float'>
1.0
<type 'str'>
äöü
[2.0, u'\xdf']
<type 'float'>
2.0
<type 'str'>
ß
[3.0, u'\xb5']
<type 'float'>
3.0
<type 'str'>
µ
[4.0, '?']
<type 'float'>
4.0
<type 'str'>
?
[5.0, 'HALLO']
<type 'float'>
5.0
<type 'str'>
HALLO
>>>
Reply all
Reply to author
Forward
0 new messages