RAM spike and MemoryError

James Parris

unread,

Mar 29, 2013, 1:57:47 PM3/29/13

to openpyx...@googlegroups.com

I am clearly doing something very wrong. I am not a python developer, but rather a shell scripter.

I have a process that is only supposed to convert a single csv file to xlxs, which should be easy, right?

Well, converting one 6mb file is taking hundreds of mb. Usually the file is only 1mb, so it wasn't an issue.

This has worked for months - but now that the csv has gotten a little larger, the process is useless.

Also, the usage only sky-rockets on the last step (the wb.save command).

Here is my very short code:

# ------ Read file to Array ------------------- :
StoredData = []
try:
FileReader = csv.reader(open(file), delimiter=D)
except:
print "Unable to parse csv %s."%file
usage(2)
for line in FileReader:
for item in line:
item = unicode(item.decode(locale.getpreferredencoding()))
StoredData.append(line)

# ------ Write Array to Worksheet ------------- :

wb = Workbook()
ws = wb.create_sheet()
ws.title = os.path.basename(file).split('.')[0]

#for Row in StoredData: # doesn't work because AIX, workaround below
# ws.append(Row)

for R in xrange(len(StoredData)):
for C in xrange(len(StoredData[R])):
col = cell.get_column_letter(C + 1)
ws.cell('%s%s'%(col, (R+1))).set_value_explicit(value=StoredData[R][C], data_type='s')
StoredData[R][C] = ''

( The above odd way of writing is because this runs on AIX. Explained here.)

outfile = '.'.join([file.split('.')[0],'xlsx'])
# Bugfix _ Remove extra worksheet if one exists.
print wb.get_sheet_names()
for sheet in wb.get_sheet_names():
if sheet != ws.title:
try:
wb.remove_sheet(wb.get_sheet_by_name(sheet))
except:
print "Warning: failed to delete buggy worksheet '%s'."%sheet
pass
# -------------------------

(That thing was added because it being formed with a corrupted 'sheet 1', which prevented the file from being opened in Windows.)

But at this point the system RAM usage is a little less than twice the size of the file, which is OK. So for a 6mb file, the RAM usage is like 10321kb.

And then this last step happens.

wb.save(filename = outfile)

And then this happens:

svmon -Pu | grep 393712

Pid Command Inuse

-------------------------------------------------

393712 python2.6 10362

...

393712 python2.6 117860

...

393712 python2.6 214441

...

393712 python2.6 409600

Traceback (most recent call last):
File "/batch/common/bin/excelify.py", line 94, in <module>
wb.save(filename = outfile)
File "/batch/common/lib/python2.6/site-packages/openpyxl/workbook.py", line 207, in save
save_workbook(self, filename)
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 141, in save_workbook
writer.save(filename)
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 124, in save
self.write_data(archive)
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 70, in write_data
self._write_worksheets(archive, shared_string_table, self.style_writer)
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 91, in _write_worksheets
style_writer.get_style_by_hash()))
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/worksheet.py", line 62, in write_worksheet
write_worksheet_data(doc, worksheet, string_table, style_table)
File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/worksheet.py", line 174, in write_worksheet_data
start_tag(doc, 'c', attributes)
File "/batch/common/lib/python2.6/site-packages/openpyxl/shared/xmltools.py", line 83, in start_tag
doc.startElementNS((namespace, name), name, attr2)
File "/opt/freeware/lib/python2.6/xml/sax/saxutils.py", line 139, in startElementNS
self._write('<' + self._qname(name))
File "/opt/freeware/lib/python2.6/xml/sax/saxutils.py", line 99, in _write
self._out.write(text)
File "/opt/freeware/lib/python2.6/StringIO.py", line 221, in write
self.buflist.append(s)
MemoryError

Annnnd dead.

James Parris

unread,

Mar 30, 2013, 8:50:15 AM3/30/13

to openpyx...@googlegroups.com

Replying to myself..

I guess this applies?
https://bitbucket.org/ericgazoni/openpyxl/issue/8/reduce-memory-footprint

No hope then? I'll have to use xlrd and make an xls instead of an xlsx :(

Back to the drawing board.

Eric

unread,

Mar 30, 2013, 8:56:16 AM3/30/13

to openpyx...@googlegroups.com

No no, you can create your workbook using wb=Workbook (True) and just append rows as you do already.

Cheers
Eric

--
You received this message because you are subscribed to the Google Groups "openpyxl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpyxl-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

James Parris

unread,

Mar 30, 2013, 1:24:54 PM3/30/13

to openpyx...@googlegroups.com

Wow great! But what is 'True'?

James Parris

unread,

Mar 30, 2013, 1:58:50 PM3/30/13

to openpyx...@googlegroups.com

optimized_write?

Charlie Clark

unread,

Mar 30, 2013, 4:19:54 PM3/30/13

to openpyx...@googlegroups.com

Yes, see the docs:

http://pythonhosted.org/openpyxl/optimized.html#optimized-writer

Charlie

James Parris <jparr...@gmail.com> hat geschrieben:
optimized_write?

Charlie Clark

unread,

Mar 30, 2013, 4:34:07 PM3/30/13

to openpyx...@googlegroups.com

Am 29.03.2013, 18:57 Uhr, schrieb James Parris <jparr...@gmail.com>:

> I am clearly doing something very wrong. I am not a python developer,
> but
> rather a shell scripter.

Eric has already pointed out the optimised writer option. I've got some
comments on the Python.

> I have a process that is only supposed to convert a single csv file to
> xlxs, which should be easy, right?
> Well, converting one 6mb file is taking hundreds of mb. Usually the file
> is only 1mb, so it wasn't an issue.
> This has worked for months - but now that the csv has gotten a little
> larger, the process is useless.
>
> Also, the usage only sky-rockets on the last step (the wb.save command).
>
>
> Here is my very short code:
>
>
>>
>> # ------ Read file to Array ------------------- :
>> StoredData = []
>> try:
>> FileReader = csv.reader(open(file), delimiter=D)
>> except:
>> print "Unable to parse csv %s."%file
>> usage(2)
>> for line in FileReader:
>> for item in line:
>> item = unicode(item.decode(locale.getpreferredencoding()))
>> StoredData.append(line)

You almost certainly do not want to use .getpreferredencoding but an
explicit one such as UTF-8. And unicode() is unnecessary if you are
already decoding an item.

Also you probably don't want to use a temporary list like StoredData. You
can either simply add to your worksheet while you iterate through the CSV,
or create a generator function that yields lines from your CSV reader.
Growing lists is fairly expensive in Python.

>
>> # ------ Write Array to Worksheet ------------- :
>>
>> wb = Workbook()
>> ws = wb.create_sheet()
>> ws.title = os.path.basename(file).split('.')[0]
>>
>> #for Row in StoredData: # doesn't work because AIX, workaround below
>> # ws.append(Row)

What is the AIX error?

>> for R in xrange(len(StoredData)):
>> for C in xrange(len(StoredData[R])):
>> col = cell.get_column_letter(C + 1)
>> ws.cell('%s%s'%(col,
>> (R+1))).set_value_explicit(value=StoredData[R][C], data_type='s')
>> StoredData[R][C] = ''

Use enumerate() on a list to get the index position and the item.

> ( The above odd way of writing is because this runs on AIX. Explained

> here.<https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/f2LNafTUKwY>
> )

>
> outfile = '.'.join([file.split('.')[0],'xlsx'])

Use the os.path module to manipulate file names and paths.

>> # Bugfix _ Remove extra worksheet if one exists.
>> print wb.get_sheet_names()
>> for sheet in wb.get_sheet_names():
>> if sheet != ws.title:
>> try:
>> wb.remove_sheet(wb.get_sheet_by_name(sheet))
>> except:
>> print "Warning: failed to delete buggy worksheet '%s'."%sheet
>> pass
>> # -------------------------

Always use explicit Exceptions in an except clause, ie. except
AttributeError

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Reply all

Reply to author

Forward