Hiya Alex,
Am .09.2014, 01:22 Uhr, schrieb <
axf...@gmail.com>:
> OpenPyxl is dying on me when trying to work with a workbook that has one
> sheet containing about 500k rows and another sheet containing about 60k
> rows -- Python's memory usage steadily climbs to around 1gb then is
> automatically terminated. I could conceivably raise this limit to around
> 2gb, but I'm afraid that might not do the trick anyhow, and after seeing
> that there are apparently some optimized methods available, I'd like to
> tryusing them.
Memory use is driven by a number of things but mainly the number of cells
and what they contain. You are working with some pretty large files and if
those rows are wide then you *will* have to increase the amount of memory
available to Python.
The implementation for cells is already very good - I ran some comparisons
with dictionaries and they are only slightly faster and smaller.
> Here's the relevant snippet of my code:
> workbook = openpyxl.load_workbook(workbookname)
> sheets = workbook.get_sheet_names()
> totalsheets = len(sheets)
> if totalsheets!=1:
> workbooks = {}
> for y in sheets:
> if y != "DV-IDENTITY-0":
> workbooks[y] = copy.deepcopy(workbook)
> for x in sheets:
> if x != y:
> workbooks[y].remove_sheet(workbooks[y].get_sheet_by_name(x))
> cleanname = re.sub(r'[^a-zA-Z0-9\-_]','',y)
> filename = csvlist_filepath+cleanname+'.xlsx'
> workbooks[y].save(filename)
> Works fine on small files. When I try adding use_iterators = True as an
> argument to that first load_workbook() line, however, openpyxl dies with
The optimised modes are for when you are either reading or writing only. I
don't think that is your case so you will have to load everything into
memory.
Using copy.deepcopy is not advised on workbooks and I wouldn't be
surprised to see it add to your memory woes. See
https://bitbucket.org/openpyxl/openpyxl/issue/171/ for more information.
The traceback is probably misleading. Read-only workbooks and worksheets
inherit from the standard classes which means that methods are available
unless they are explicitly disabled or removed: you simply shouldn't be
able to remove a worksheet from a read-only workbook. Work is ongoing to
clean this up either to make things work, or remove the method, or provide
you an error. Starting with 2.1 we also refer to read-only or write-only
workbooks.
It seems there is some demand for a kind of template function which we
might be able to offer in a future version. Like the other optimisations
this will involve some kind of compromise (ie. some things won't be
possible). It looks to me at the moment like an extension of the existing
write-only workbooks.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel:
+49-211-600-3657
Mobile:
+49-178-782-6226