So I've run across an error with certain xls files when using xlrd, version 1.0.0. When running "open_workbook()" I get a "KeyError(65535,)". But I can't easily provide an example file because the files contain sensitive information and we have no control over their creation. I can open it fine with Excel, but if I save it in any way, it then works just fine with xlrd, suggesting that it is created by some third-party software that is doing something quirky.
So this is the error:
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. | Traceback (most recent call last): File "C:\web2py\gluon\restricted.py", line 227, in restricted exec ccode in environment File "C:/web2py/applications/dealer_portal_dev/controllers/test.py", line 38, in <module> File "C:\web2py\gluon\globals.py", line 417, in <lambda> self._caller = lambda f: f() File "C:/web2py/applications/dealer_portal_dev/controllers/test.py", line 37, in test_parse xlrd.open_workbook(filepath, verbosity=1) File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook ragged_rows=ragged_rows, File "C:\Python27\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls bk.get_sheets() File "C:\Python27\lib\site-packages\xlrd\book.py", line 678, in get_sheets self.get_sheet(sheetno) File "C:\Python27\lib\site-packages\xlrd\book.py", line 669, in get_sheet sh.read(self) File "C:\Python27\lib\site-packages\xlrd\sheet.py", line 837, in read elif rc == XL_MULRK: File "C:\Python27\lib\site-packages\xlrd\sheet.py", line 683, in put_cell_unragged try: KeyError: 65535 |
So anyway it appears that one of the "xf_index" values that is read in from the file is "65535", but when this function (from Sheet.put_cell_unragged() in sheet.py) is run,
def put_cell_unragged(self, rowx, colx, ctype, value, xf_index):
if ctype is None:
# we have a number, so look up the cell type
ctype = self._xf_index_to_xl_type_map[xf_index]
... the index is not found in self._xf_index_to_xl_type_map, which is built up in the function "xf_epilogue" in formatting.py (called from "open_workbook_xls()" in book.py).
So here's my quick fix, almost certainly not the ultimate fix:
def put_cell_unragged(self, rowx, colx, ctype, value, xf_index):
if ctype is None:
# we have a number, so look up the cell type
# MANUAL OVERRIDE when xf_index is not found in map (ex. 65535, grossly large)
try:
ctype = self._xf_index_to_xl_type_map[xf_index]
except KeyError:
ctype = XL_CELL_TEXT
This same change would also need to be made to the "put_cell_ragged" function, obviously. This xl_type is almost certainly not used, but I need to set the ctype to something, and so I put in the same "fallback" xl type that is used in the "xf_epilogue" function when the _xf_index_to_xl_type_map object is built.
Okay, putting that aside, in my testing I ran into something else strange. If you use the "formatting_info=True" option when opening the workbook, I ran into a related error:
OverflowError: signed short integer is greater than maximum
In "Sheet.read()", the xf_index is unpacked as an "H"; that is, an unsigned short integer. However, as seen in this code from Sheet.put_cell_unragged():
try:
self._cell_types[rowx][colx] = ctype
self._cell_values[rowx][colx] = value
if self.formatting_info:
self._cell_xf_indexes[rowx][colx] = xf_index
except:
print("put_cell", rowx, colx, file=self.logfile)
raise
... the xf_index is put into self._cell_xf_indexes, but that column index is initialized as
self.bf = array('h', [-1])... that is, as a
signed short integer. So if the xf_index is "65535", as mine was, then it overflows that data type and throws an error. Changing the initialized type from 'h' to 'i' seems to fix the problem.
Whew! Hope this helps anyone having the same problem! Sorry for not including the bad file. As I said, posting it publically is not something I'm comfortable with, and I can't change the data without having Excel fix the problem automatically.
-- Jeremy