RAM spike and MemoryError

432 views
Skip to first unread message

James Parris

unread,
Mar 29, 2013, 1:57:47 PM3/29/13
to openpyx...@googlegroups.com
I am clearly doing something very wrong.  I am not a python developer, but rather a shell scripter.

I have a process that is only supposed to convert a single csv file to xlxs, which should be easy, right?
Well, converting one 6mb file is taking hundreds of mb.  Usually the file is only 1mb, so it wasn't an issue.  
This has worked for months - but now that the csv has gotten a little larger, the process is useless.

Also, the usage only sky-rockets on the last step (the wb.save command).


Here is my very short code:



# ------ Read file to Array ------------------- :
StoredData = []
  try:
    FileReader = csv.reader(open(file), delimiter=D)
  except:
    print "Unable to parse csv %s."%file
    usage(2)
for line in FileReader:
  for item in line:
    item = unicode(item.decode(locale.getpreferredencoding()))
    StoredData.append(line)
 
# ------ Write Array to Worksheet ------------- :

wb = Workbook()
ws = wb.create_sheet()
ws.title = os.path.basename(file).split('.')[0]

#for Row in StoredData: # doesn't work because AIX, workaround below
# ws.append(Row)

for R in xrange(len(StoredData)):
  for C in xrange(len(StoredData[R])):
    col = cell.get_column_letter(C + 1)
    ws.cell('%s%s'%(col, (R+1))).set_value_explicit(value=StoredData[R][C], data_type='s')
    StoredData[R][C] = ''

( The above odd way of writing is because this runs on AIX.  Explained here.)

outfile = '.'.join([file.split('.')[0],'xlsx'])
# Bugfix  _ Remove extra worksheet if one exists.
print wb.get_sheet_names()
for sheet in wb.get_sheet_names():
  if sheet != ws.title:
    try:
      wb.remove_sheet(wb.get_sheet_by_name(sheet))
    except:
      print "Warning: failed to delete buggy worksheet '%s'."%sheet
      pass
# -------------------------

(That thing was added because it being formed with a corrupted 'sheet 1', which prevented the file from being opened in Windows.)
But at this point the system RAM usage is a little less than twice the size of the file, which is OK.  So for a 6mb file, the RAM usage is like 10321kb.

And then this last step happens.

wb.save(filename = outfile)

And then this happens:

svmon -Pu | grep 393712 

     Pid   Command          Inuse     
------------------------------------------------- 
  393712 python2.6        10362     
...
  393712 python2.6        117860   
...
  393712 python2.6        214441   
...
  393712 python2.6        409600   


Traceback (most recent call last):
  File "/batch/common/bin/excelify.py", line 94, in <module>
    wb.save(filename = outfile)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/workbook.py", line 207, in save
    save_workbook(self, filename)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 141, in save_workbook
    writer.save(filename)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 124, in save
    self.write_data(archive)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 70, in write_data
    self._write_worksheets(archive, shared_string_table, self.style_writer)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/excel.py", line 91, in _write_worksheets
    style_writer.get_style_by_hash()))
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/worksheet.py", line 62, in write_worksheet
    write_worksheet_data(doc, worksheet, string_table, style_table)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/writer/worksheet.py", line 174, in write_worksheet_data
    start_tag(doc, 'c', attributes)
  File "/batch/common/lib/python2.6/site-packages/openpyxl/shared/xmltools.py", line 83, in start_tag
    doc.startElementNS((namespace, name), name, attr2)
  File "/opt/freeware/lib/python2.6/xml/sax/saxutils.py", line 139, in startElementNS
    self._write('<' + self._qname(name))
  File "/opt/freeware/lib/python2.6/xml/sax/saxutils.py", line 99, in _write
    self._out.write(text)
  File "/opt/freeware/lib/python2.6/StringIO.py", line 221, in write
    self.buflist.append(s)
MemoryError



Annnnd dead.




James Parris

unread,
Mar 30, 2013, 8:50:15 AM3/30/13
to openpyx...@googlegroups.com
Replying to myself..


No hope then?  I'll have to use xlrd and make an xls instead of an xlsx :(
Back to the drawing board.

Eric

unread,
Mar 30, 2013, 8:56:16 AM3/30/13
to openpyx...@googlegroups.com

No no, you can create your workbook using wb=Workbook (True) and just append rows as you do already.

Cheers
Eric

--
You received this message because you are subscribed to the Google Groups "openpyxl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpyxl-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

James Parris

unread,
Mar 30, 2013, 1:24:54 PM3/30/13
to openpyx...@googlegroups.com
Wow great!  But what is 'True'?

James Parris

unread,
Mar 30, 2013, 1:58:50 PM3/30/13
to openpyx...@googlegroups.com
optimized_write?

Charlie Clark

unread,
Mar 30, 2013, 4:19:54 PM3/30/13
to openpyx...@googlegroups.com
Yes, see the docs: 

Charlie

James Parris <jparr...@gmail.com> hat geschrieben:
optimized_write?

Charlie Clark

unread,
Mar 30, 2013, 4:34:07 PM3/30/13
to openpyx...@googlegroups.com
Am 29.03.2013, 18:57 Uhr, schrieb James Parris <jparr...@gmail.com>:

> I am clearly doing something very wrong. I am not a python developer,
> but
> rather a shell scripter.

Eric has already pointed out the optimised writer option. I've got some
comments on the Python.

> I have a process that is only supposed to convert a single csv file to
> xlxs, which should be easy, right?
> Well, converting one 6mb file is taking hundreds of mb. Usually the file
> is only 1mb, so it wasn't an issue.
> This has worked for months - but now that the csv has gotten a little
> larger, the process is useless.
>
> Also, the usage only sky-rockets on the last step (the wb.save command).
>
>
> Here is my very short code:
>
>
>>
>> # ------ Read file to Array ------------------- :
>> StoredData = []
>> try:
>> FileReader = csv.reader(open(file), delimiter=D)
>> except:
>> print "Unable to parse csv %s."%file
>> usage(2)
>> for line in FileReader:
>> for item in line:
>> item = unicode(item.decode(locale.getpreferredencoding()))
>> StoredData.append(line)

You almost certainly do not want to use .getpreferredencoding but an
explicit one such as UTF-8. And unicode() is unnecessary if you are
already decoding an item.

Also you probably don't want to use a temporary list like StoredData. You
can either simply add to your worksheet while you iterate through the CSV,
or create a generator function that yields lines from your CSV reader.
Growing lists is fairly expensive in Python.

>
>> # ------ Write Array to Worksheet ------------- :
>>
>> wb = Workbook()
>> ws = wb.create_sheet()
>> ws.title = os.path.basename(file).split('.')[0]
>>
>> #for Row in StoredData: # doesn't work because AIX, workaround below
>> # ws.append(Row)

What is the AIX error?

>> for R in xrange(len(StoredData)):
>> for C in xrange(len(StoredData[R])):
>> col = cell.get_column_letter(C + 1)
>> ws.cell('%s%s'%(col,
>> (R+1))).set_value_explicit(value=StoredData[R][C], data_type='s')
>> StoredData[R][C] = ''

Use enumerate() on a list to get the index position and the item.

> ( The above odd way of writing is because this runs on AIX. Explained
> here.<https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/f2LNafTUKwY>
> )
>
> outfile = '.'.join([file.split('.')[0],'xlsx'])

Use the os.path module to manipulate file names and paths.

>> # Bugfix _ Remove extra worksheet if one exists.
>> print wb.get_sheet_names()
>> for sheet in wb.get_sheet_names():
>> if sheet != ws.title:
>> try:
>> wb.remove_sheet(wb.get_sheet_by_name(sheet))
>> except:
>> print "Warning: failed to delete buggy worksheet '%s'."%sheet
>> pass
>> # -------------------------

Always use explicit Exceptions in an except clause, ie. except
AttributeError

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
Reply all
Reply to author
Forward
0 new messages