using xlrd and xlwt to read combine multiple excel spreadsheets

1,774 views
Skip to first unread message

Jordan Cheney

unread,
Jul 14, 2011, 12:22:26 PM7/14/11
to python-excel
I am doing a project where I need to take data from 1000+
spreadsheets, reformat and then recombine into one spreadsheet. I am
using xlrd and xlwt to do this and have been successful taking data
from 1 spreadsheet and reformatting and returning to excel. To save
time I would like my program to do all 1000 at once rather then me
having to type in each file name.


import xlrd
import xlwt

def ReadandWrite(rownum, swb, fwb, Fsheet):
sheet = swb.sheet_by_index(0)

RList = [str(v).strip() for v in
sheet.col_values(colx=2,start_rowx=1, end_rowx=13)]

emptyCell = sheet.cell(1,3).value

rowEmpty = False
r = 15

while not rowEmpty:
print r
rowEmpty = all(val == emptyCell for val in
sheet.row_values(r,end_colx=8))
print rowEmpty
r+=1

rs = rownum

for rows in range(15,r-1):
Flist = []
cs = 0
IList = [str(x).strip() for x in
sheet.row_values(rows,end_colx=8)]

FList = RList + IList
for value in FList:
Fsheet.write(rs,cs,value)
cs+=1
rs+=1

return rs

def main():
fwb = xlwt.Workbook()
Fsheet = fwb.add_sheet('Final Formatted Data')
rownum = 0
for name in range(1,3):
fname = str(name)
fname = fname + ".xls"
swb = xlrd.open_workbook(fname)
rownum = ReadandWrite(rownum, Swb, fwb, Fsheet)

fwb.save('test.xls')

main()



This is my code so far, I am running into an error at line 14
"rowEmpty = all(val == emptyCell for val in
sheet.row_values(r,end_colx=8))", the error is "list index out of
range" and was wondering if anyone could help me.

John Machin

unread,
Jul 14, 2011, 5:17:54 PM7/14/11
to python...@googlegroups.com


On Friday, 15 July 2011 02:22:26 UTC+10, Jordan Cheney wrote:
I am doing a project where I need to take data from 1000+
spreadsheets, reformat and then recombine into one spreadsheet. I am
using xlrd and xlwt to do this and have been successful taking data
from 1 spreadsheet and reformatting and returning to excel. To save
time I would like my program to do all 1000 at once rather then me
having to type in each file name.


Max # rows in your output sheet == 65536
65536 // 1000 == 65
Not a problem?

 
import xlrd
import xlwt

def ReadandWrite(rownum, swb, fwb, Fsheet):
  sheet = swb.sheet_by_index(0)

  RList = [str(v).strip() for v in
sheet.col_values(colx=2,start_rowx=1, end_rowx=13)]

  emptyCell = sheet.cell(1,3).value

  rowEmpty = False
  r = 15

  while not rowEmpty:
    print r
    rowEmpty = all(val == emptyCell for val in
sheet.row_values(r,end_colx=8))

When asking a question about an exception that you don't understand, please include the full error message and full traceback. In general; to do otherwise is to waste both your time and that of your audience.

What was the last output from "print r"?
What does "print sheet.nrows" tell you?
You say it worked with one input file; did that have any empty rows at the end of the sheet?

    print rowEmpty
    r+=1

  rs = rownum

  for rows in range(15,r-1):
    Flist = []
    cs = 0
    IList = [str(x).strip() for x in
sheet.row_values(rows,end_colx=8)]

Why are you converting all your numbers to text?

    FList = RList + IList
    for value in FList:
      Fsheet.write(rs,cs,value)
      cs+=1
Suggestion:

for cs, value in enumerate(Flist):
    Fsheet.write(rs, cs, value)

In the interests of clarity and robustness, you really should compress your two loops over the rows into one loop, something like this:

# pseudocode
for rowx in xrange(15, sheet.nrows):
    if the rowx row is empty:
        break # or maybe continue
    write the values for the rowx row

Suggestion: read this:
http://www.python.org/dev/peps/pep-0008/
Pay particular attention to:
    indent by 4 spaces (not 2)
    conventions for variable names (e.g. empty_cell, not emptyCell)
    space around operators (e.g. foo += 1)
 

Jordan Cheney

unread,
Jul 15, 2011, 11:58:35 AM7/15/11
to python-excel
@John

I will keep the overall length of the spreadsheet in mind, thanks for
pointing that out. The full error and traceback call for this is

Traceback (most recent call last):
File "C:\Python27\pythontryout.py", line 47, in
(module) main()
File "C:\Python27\pythontryout.py", line 43, in main
rownum = ReadandWrite(rownum, swb, fwb, Fsheet)
File "C:\Python27\pythontryout.py", line 16, in
ReadandWrite rowEmpty = all(val == emptyCell for
val in sheet.row_values(r,end_colx=8))
File "C:\Python27\lib\site-packages\xlrd\sheet.py",
line 312, in row_values return self._cell_values[rowx]
[start_colx:end_colx]
IndexError: list index out of range

print r shows 42 right before the error which is the first empty line
of the excel spreadsheet. print sheet.numrows says the spreadsheet
contains 42 rows. I convert all of my floats to strings because they
were imported with white space and I wanted to strip them. The
ReadandWrite function worked with this excel file when I eliminated
the main function and didn't try and loop through two excel files at
once.
> > range" and was wondering if anyone could help me.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

John Machin

unread,
Jul 15, 2011, 6:11:46 PM7/15/11
to python...@googlegroups.com


On Saturday, 16 July 2011 01:58:35 UTC+10, Jordan Cheney wrote:
@John

I will keep the overall length of the spreadsheet in mind, thanks for
pointing that out. The full error and traceback call for this is

Traceback (most recent call last):
  File "C:\Python27\pythontryout.py", line 47, in
 (module) main()
  File "C:\Python27\pythontryout.py", line 43, in main
    rownum = ReadandWrite(rownum, swb, fwb, Fsheet)
  File "C:\Python27\pythontryout.py", line 16, in
    ReadandWrite rowEmpty = all(val == emptyCell for
    val in sheet.row_values(r,end_colx=8))
  File "C:\Python27\lib\site-packages\xlrd\sheet.py",
    line 312, in row_values return self._cell_values[rowx]
    [start_colx:end_colx]
IndexError: list index out of range

print r shows 42 right before the error which is the first empty line
of the excel spreadsheet. print sheet.numrows says the spreadsheet
contains 42 rows.

You mean sheet.nrows. If that contains 42, then the legal row indices are 0, 1, ....., 40, 41. If r is 42, you are trying to address off the end of the sheet. You seem to have ignored my suggestion to use


for rowx in xrange(15, sheet.nrows):

Why?

 
I convert all of my floats to strings because they
were imported with white space and I wanted to strip them.

If the "floats" in your input workbooks contain leading/tyrailing whitespace, they are already text cells. If leaving them as text is acceptable, then you you don't need xlwt -- a CSV file is good enough. However if somebody wants to do some calculations, it would be better if you wrote number cells to the spreadsheet. The following function will convert a cell value to float if possible. The built-in float() function strips leading and trailing whitespace.

| >>> def refloat(value):
| ...     try:
| ...         return float(value)
| ...     except ValueError:
| ...         return value
| ...
| >>> refloat('    123.456    ')
| 123.456
| >>> refloat('a heading')
| 'a heading'
| >>> refloat(3.1416)
| 3.1416

Here is a function that does what you seem to want to do. You'll notice that (inter alia) it is much shorter than the original, has no while loops, and has only one (necessary) occurrence of thing += 1 ... and looks a bit more like Python code :-)

def append_sheet_values(in_sheet, out_sheet, out_rowx):
    rlist = [str(v).strip() for v in in_sheet.col_values(colx=2, start_rowx=1, end_rowx=13)]
    empty_cell = in_sheet.cell_value(1, 3)
    for in_rowx in xrange(15, in_sheet.nrows):
        raw_data = in_sheet.row_values(rowx=in_rowx, end_colx=8)
        if all(val == empty_cell for val in raw_data):

             break # or maybe continue
        flist = rlist + [str(v).strip() for v in raw_data]
        for out_colx, value in enumerate(flist):
            out_sheet.write(out_rowx, out_colx, value)
        out_rowx += 1
    return out_rowx # start row_index for next sheet

HTH,
John

Reply all
Reply to author
Forward
0 new messages