Can you provide the full traceback please?
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
But you suspect that the error is happening when the input xls is opened
(which appears to be BEFORE the data is accessed) ... which for loop
is that?
> OS Unix 5.9, Python version 2.4.2.
What is "Unix 5.9"? Please tell us what you get when you run this:
import sys
print sys.platform
print sys.version
>
> On Aug 17, 4:00 pm, jennyshk <jenny...@gmail.com> wrote:
>> Hi,
>>
>> An error 'EnvironmentError: [Errno 22] Invalid argument' occurs when I
>> run my script which basically opens up file and writes to it until the
>> loop has finished. The script runs fine after several times,
after several times of what?
>> but then
>> in the middle
in the middle of what?
>> it just throws this error and I suspect its something to
>> do with opening the workbook in the method below.
WHY do you so suspect?
>> I'm unsure why? I'm
>> still quite new to python but I've followed this code closely to the
>> examples provided in the tutorials, maybe I'm missing something?? Do I
>> need to close the workbook at the end or does saving it enough?
There is no close() method for either input or output workbook, because
none is (should be?) needed. The input file is closed when reading is
finished. The output file is closed when writing is finished.
>>
>> def collectDBSubtotal(database, subtotal, filename, lastmonth):
>> #todaysDate = datetime.date(2000,1,1).today()
>>
>> rb = open_workbook(save_file+filename,formatting_info=True) <-----
A high proportion of results from a Google search for the error message
"EnvironmentError: [Errno 22] Invalid argument" mention mmap; please add
", use_mmap=False" to the open_workbook args and try again and let us
know the result.
>> Error exists here
It would help enormously (with any problem with an unexpected or
mystifying Python exception) if you were to give the full traceback and
error message.
What is in "save_file"? Is it a global?
>> readSheet = rb.sheet_by_index(0) #reference to read subtotal
>> print "get sheet by index"
>> wb = copy(rb)
This would be xlutils.copy(), would it?
>> print "Copied workbook"
>> writeSheet = wb.get_sheet(0) #reference to write subtotal
>> print "About to writesheet"
>> if readSheet.nrows != 0 :
>> found = 0
>> for rowx in xrange(readSheet.nrows) :
>> if rowx != 0 :
Nothing to do with your problem, but consider compressing the previous
two lines into one line by using xrange(1, readSheet.nrows)
>> readValue = readSheet.cell_value(rowx, 0)
>> rowNum = str(rowx+1)
>> if readValue == database.lower() :
>> writeSheet.write(rowx, lastmonth, subtotal)
>> writeSheet.write(rowx, 13, Formula('SUM($B$'+rowNum
>> +':$M$'+rowNum+')'))
>> found = 1
>> else :
>> writeSheet.write(rowx, 13, Formula('SUM($B$'+rowNum
>> +':$M$'+rowNum+')'))
>> if not found : #db not found, add at the end
>> writeSheet.write(readSheet.nrows, 0, database.lower())
>> #write dbname
>> writeSheet.write(readSheet.nrows, lastmonth, subtotal)
>> #write subtotal
>> rowNum = str(readSheet.nrows+1)
>> writeSheet.write(readSheet.nrows, 13, Formula('SUM($B
>> $'+rowNum+':$M$'+rowNum+')')) #write formula
>> else :#start writing some rows
>> writeSheet.write(1, 0, database.lower())
>> writeSheet.write(1, lastmonth, subtotal)
>> writeSheet.write(1, 13, Formula('SUM($B$1:$M$1'))
>> print "Saving file and leaving collectDBSubtotal" +save_file
>> wb.save(save_file+filename)
Ummmm ... saving the output file with the same name as the input file?
......
OK, let me see if my reading of the above is correct:
This function collectDBSubtotal has args:
database: key which may match a value in column 0.
lastmonth: values are 1, ..., 12
subtotal: a number to be put in column[lastmonth]
Each time the function is called:
(1) the input xls is opened, and the xlrd Book object is copied to an
xlwt Workbook object
(2) you loop over all the rows in the first worksheet; if column[0]
matches database, you update column[lastmonth] = subtotal.
(3) if database was not found, you write a new row at the end.
(4) not sure why you want to plug a row-total Formula in at all, let
alone why you rewrite it on non-database rows, but we'll ignore that for
the moment.
(5) the output workbook is saved.
Now that's a whole lot of pushing and shoving to update (effectively)
ONE cell[database][lastmonth] = subtotal. You mentioned 11 Mb of data
... does that mean that you are calling this function in a loop, with
many thousands of different (database, lastmonth, subtotal) tuples?
If so, I'm not altogether surprised that something gagged on that.
Perhaps xlrd needs to shut down the mmap object explicitly; perhaps not
every *x implementation of the underlying functionality works properly
when under extreme stress.
If you are actually doing what I fear, you should consider changing it
so that:
(1) your input data is a big list of (database, lastmonth, subtotal) tuples.
(2) you do these things exactly ONCE each: open input workbook, copy to
output workbook, read through the xlrd worksheet.cellvalue(rowx, 0)
making a dict which maps from database to rowx
(3) then you iterate over the list of tuples; for each tuple you do this:
if database in the_map:
rowx = the_map[database]
else:
the_map[database] = rowx = lastrowx = lastrowx + 1
writeSheet.write(rowx, 0, database)
writeSheet.write(rowx, lastmonth, subtotal)
# mucking about with formulas is left as an exercise
HTH,
John
Hi jennyshk,
>
> Thank you very much for your prompt replies. Sorry I wasn't very
> clear in my explanation.
The following explanation seems to have little bearing bearing on your
problem ...
> In general, my script produces a monthly
> summary report (excel format), of the types of databases that have
> been accessed through EZProxy webserver and finds out which University
> faculty they're from.
>
> That def collectDBSubtotal() method gets called from another method
> reportMonthlyStatistics() which pulls data from a class object
> FacultyBreakdown. Before reportMonthlyStatistics() is invoked to do a
> big stats report, the class object is called throughout the script to
> store information such as the database names and faculty names in the
> class constructor by x.incrDB(databaseName, facultyName). I feed the
> class each time with the 2 variables which are extracted from certain
> loglines in the logfile. About a months worth of web logfiles are
> searched where 1 daily gzipped logfile is about 12MB.
... except to tell us that you have a large quantity of data.
>
> I suspected that the problem came from biff2_8_load() due to the
> traceback call (posted below)
Posted below where? I see no traceback output.
> which occurs after a number of times the
> workbook is opened and rewritten to in the collectDBSubtotal(). Yes I
> too think its because of the sheer volume of data that finally, the
> workbook can no longer run and decides to fails to open. I have used
> parameter
>
> I have used mmap=false in open_workbook() and it worked! It took 78
> hours to run however
I'm rather boggled that you would let it run for so long ... don't you
check performance on a subset of your data? E.g. work out how many
whatevers per minute it is doing and extrapolate that to a month's worth?
And how many whatevers per minute was it achieving with use_mmap=True
before it fell over?
> so I will have to try cut some things out of my
> script to make it shorter. Anyway thanks greatly John and Chris for
> your input, I can finally move onto the next bits!
Presumably the "next bits" include looking through the list of questions
that I asked you in my previous message and addressing some of them, e.g.
(1) This one appears highly relevant to your 78-hour run time: """Now
that's a whole lot of pushing and shoving to update (effectively) ONE
cell[database][lastmonth] = subtotal. You mentioned 11 Mb of data ...
does that mean that you are calling this function in a loop, with many
thousands of different (database, lastmonth, subtotal) tuples?"""
(2) """What is "Unix 5.9"? Please tell us what you get when you run this:
import sys
print sys.platform
print sys.version
"""
This question wasn't posed out of idle curiosity -- quite apart from the
fact that that sort of info should be supplied up front without being
asked, there's the question of whether mmap is behaving badly on your
system and whether xlrd can be changed to work around or bypass the problem.