Making a big Excel file without running out of memory

2,196 views
Skip to first unread message

Ram Rachum

unread,
May 15, 2011, 6:41:10 PM5/15/11
to python...@googlegroups.com
Hello folks,

I work on a Django website, and I wrote code that uses `xlwt` to produce a big Excel file, with about 10,000 rows and 15 columns. The `.xls` file weights about 1.5 megabytes. I actually don't know if this is considered a big file or not; 1.5 megabytes sounds pretty small, but generating this report seems to have crashed our (decent) web server several times. When we make smaller excel files, like 1,000 rows, the server doesn't crash, but 10,000 rows make Apache crash and stop responding to visitors.

I'm guessing that the server just runs out of memory. We keep the entire `.xls` file as a string and then send it to the browser. Perhaps this is causing the problem? But on the other hand the `.xls` is only 1.5 MB, so perhaps the memory waste goes on `xlws` data structures?

Does anyone have any clue what to do in order to generate big excel files without crashing our server?


Thanks,
Ram.
Message has been deleted
Message has been deleted

derek

unread,
May 16, 2011, 2:37:04 AM5/16/11
to python-excel
Just a heads-up Ram. John, who answers most of the questions on this
list, will not be impressed with vague statements like "seems to have
crashed" and "'I'm guessing that...".

When reporting a problem, you need to be as specific as possible about
elements such as:
* the specs for your machine, including versions of software in use
* error traces/logs that are created
* the script that is causing the problem (full details, if possible,
otherwise a minimal test script that can reproduce this problem)
* whether or not you can reproduce the crash outside of the Django
environment.

I also say this because I have managed to produce large Excel reports
under Django without the problems you have encountered.

Derek

Chris Withers

unread,
May 16, 2011, 2:51:23 AM5/16/11
to python...@googlegroups.com, Ram Rachum
On 15/05/2011 23:41, Ram Rachum wrote:
> I work on a Django website, and I wrote code that uses `xlwt` to produce
> a big Excel file, with about 10,000 rows and 15 columns. The `.xls` file
> weights about 1.5 megabytes.

That's not a big workbook ;-)

> a big file or not; 1.5 megabytes sounds pretty small, but generating
> this report seems to have crashed our (decent) web server several times.

I'd suggest the "crash", whether that's memory usage or spmething else,
is related to the Django ORM, and not xlwt.

> I'm guessing that the server just runs out of memory.

How about doing some work to make that not a guess?

> causing the problem? But on the other hand the `.xls` is only 1.5 MB, so
> perhaps the memory waste goes on `xlws` data structures?

A couple of guesses:

- you're creating way too many style objects, you should be using easyxf
to create a few styles and re-using them, rather than creating them each
time you write a cell.

- You might want to try calling flush_row_data every 1000 rows or so.

I'd suggest reading the tutorial on http://www.python-excel.org/

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
May 16, 2011, 4:19:28 AM5/16/11
to python...@googlegroups.com
On Mon, May 16, 2011 4:37 pm, derek wrote:
> On May 16, 12:41�am, Ram Rachum <ram.rac...@gmail.com> wrote:
>> Hello folks,
>>
>> I work on a Django website, and I wrote code that uses `xlwt` to produce
>> a
>> big Excel file, with about 10,000 rows and 15 columns. The `.xls` file
>> weights about 1.5 megabytes. I actually don't know if this is considered
>> a
>> big file or not; 1.5 megabytes sounds pretty small, but generating this
>> report seems to have crashed our (decent) web server several times. When
>> we
>> make smaller excel files, like 1,000 rows, the server doesn't crash, but
>> 10,000 rows make Apache crash and stop responding to visitors.
>>
>> I'm guessing that the server just runs out of memory. We keep the entire
>> `.xls` file as a string and then send it to the browser. Perhaps this is
>> causing the problem? But on the other hand the `.xls` is only 1.5 MB, so
>> perhaps the memory waste goes on `xlws` data structures?
>>
>> Does anyone have any clue what to do in order to generate big excel
>> files
>> without crashing our server?
>
> Just a heads-up Ram. John, who answers most of the questions on this
> list, will not be impressed with vague statements like "seems to have
> crashed" and "'I'm guessing that...".

Derek, I would imagine that most people (including Ram's
boss/customers/users) would not be impressed by such a seeming lack of
diagnostics (can't tell whether it's chewing up memory??) and logging.

>
> When reporting a problem, you need to be as specific as possible about
> elements such as:
> * the specs for your machine, including versions of software in use
> * error traces/logs that are created
> * the script that is causing the problem (full details, if possible,
> otherwise a minimal test script that can reproduce this problem)
> * whether or not you can reproduce the crash outside of the Django
> environment.
>
> I also say this because I have managed to produce large Excel reports
> under Django without the problems you have encountered.

All very good advice, as is what Chris says in another message. Here's
some more:

Ram, Instrument your code: log (1) the start of making the xls file (2)
every 1000 rows (3) when you have finished writing rows:

log("end of data")
f = cStringIO.StringIO()
workbook.save(f)
del workbook, worksheet
log("workbook saved")
xls_string = f.getvalue()
f.close()
del f
log("got xls_string")
send_xls_2_browser(xls_string)
del xls_string
log("completed OK")

Ensure that the log is flushed to disk after each message, so nothing is
lost if something crashes.

I really don't expect a memory problem with a tiny file like 1.5 Mb, but
xlwt does have a peak memory usage during workbook.save() that includes
twice the size of the output file, plus another once if a pseudo-file
(StringIO) is being used -- having the code there will be handy when you
get to doing really big files. In any case it will be handy now because it
will help you determine if it is crashing in xlwt code and if so, where.
If it crashes at the end, run it again with 20000 rows; does it still
crash at the end, or after 10000 rows?

Walter Prins

unread,
May 16, 2011, 5:48:12 AM5/16/11
to python...@googlegroups.com
To add to what's been said already: Just to test, I've just written simple dumb loop to output a sheet with 20,000 rows and 30 columns by writing every cell individually with the value "test".  It completes in about 10 seconds on my box (2GB Core2 2.3Ghz) and eyeballing process explorer, Python's RAM use peaks at approx 160MB before dropping again.

So, I think your issues is probably not related to xlwt.

Walter

John Machin

unread,
May 16, 2011, 7:02:26 AM5/16/11
to python...@googlegroups.com
On Mon, May 16, 2011 4:51 pm, Chris Withers wrote:

> - you're creating way too many style objects, you should be using easyxf
> to create a few styles and re-using them, rather than creating them each
> time you write a cell.

This is very good advice in general. However (with a relatively recent
version of xlwt) it won't cause a memory problem; offenders are blown away
before they can do any damage:

>>> import xlwt
>>> wb = xlwt.Workbook()
>>> ws = wb.add_sheet('x')
>>> for x in xrange(10000):
... xf = xlwt.easyxf("align: wrap on")
... try:
... ws.write(x, 0, 'foo', xf)
... except:
... print "nasty at row index", x
... raise
...
nasty at row index 4078
Traceback (most recent call last):
[snip]
File "C:\python27\lib\site-packages\xlwt\Style.py", line 149, in
_add_style
raise ValueError("More than 4094 XFs (styles)")
ValueError: More than 4094 XFs (styles)
>>>


> - You might want to try calling flush_row_data every 1000 rows or so.

+1

> I'd suggest reading the tutorial on http://www.python-excel.org/

+2

Cheers,
John

Ram Rachum

unread,
May 16, 2011, 3:50:24 PM5/16/11
to python...@googlegroups.com, Ram Rachum
Thanks everyone for your help! Simply using `.flush_row_data()` solved the problem.

Thanks,
Ram.

Chris Withers

unread,
May 17, 2011, 10:15:36 AM5/17/11
to python...@googlegroups.com, John Machin
On 16/05/2011 12:02, John Machin wrote:
>> - you're creating way too many style objects, you should be using easyxf
>> to create a few styles and re-using them, rather than creating them each
>> time you write a cell.
>
> This is very good advice in general. However (with a relatively recent
> version of xlwt) it won't cause a memory problem;

Is this a released version of xlwt or svn trunk?
Also, what are you assuming is passed to the Workbook constructor for
style_compression?

cheers,

Chris

PS: John, I've sent you a number of messages off list over the last 6
months or so, and not had a reply to any of them. Can I just confirm
those messages are getting through?

John Machin

unread,
May 17, 2011, 7:17:22 PM5/17/11
to python...@googlegroups.com
On Wed, May 18, 2011 12:15 am, Chris Withers wrote:
> On 16/05/2011 12:02, John Machin wrote:
>>> - you're creating way too many style objects, you should be using
>>> easyxf
>>> to create a few styles and re-using them, rather than creating them
>>> each
>>> time you write a cell.
>>
>> This is very good advice in general. However (with a relatively recent
>> version of xlwt) it won't cause a memory problem;
>
> Is this a released version of xlwt or svn trunk?

SVN revision 3147 on Sunday 7 Oct 2007. Limit checked as part of
introduction of style compression. Included in xlwt 0.7.0 release.

> Also, what are you assuming is passed to the Workbook constructor for
> style_compression?

No assumption. The sample code that I showed defaulted to no compression,
which presumably is what the OP was using. The check is made whether or
not compression is being used. The idea is to avoid creating a dodgy XLS
leaving the end-user to puzzle over a vague message from Excel.

> PS: John, I've sent you a number of messages off list over the last 6
> months or so, and not had a reply to any of them. Can I just confirm
> those messages are getting through?

Hmmmm ... automatically diverted into a "must-read" folder :-( ... expect
replies soon.

Cheers,
John

Chris Withers

unread,
May 20, 2011, 10:19:05 AM5/20/11
to python...@googlegroups.com
On 18/05/2011 00:17, John Machin wrote:
> Hmmmm ... automatically diverted into a "must-read" folder :-( ... expect
> replies soon.

Can I get a definition of "soon"? ;-)

Chris

Chris Withers

unread,
Jun 10, 2011, 10:28:37 AM6/10/11
to python...@googlegroups.com
On 20/05/2011 15:19, Chris Withers wrote:
> On 18/05/2011 00:17, John Machin wrote:
>> Hmmmm ... automatically diverted into a "must-read" folder :-( ... expect
>> replies soon.
>
> Can I get a definition of "soon"? ;-)

Apparently not :-(

Reply all
Reply to author
Forward
0 new messages