> Hi openpyxl-users,
>
> I recently upgraded from 2.0.4 to 2.2.4 and noticed a significant
> increase
> in memory usage when opening the same workbook (with load_workbook):
>
> Excel file = 35 MB
> 2.0.4, 32bit = 800 MB
> 2.0.4, 64bit = 1400 MB
> 2.2.4, 32bit > 1700 MB # python complains about memory and gives up
> 2.2.4, 64bit = 3100 MB
>
> I know there were a lot of changes between the two versions, but I wonder
> if anyone has any insight into this?
>
> I'm aware of the optimized writer and reader. The memory usage becomes an
> issue on the 32bit machine, since as far as I know I can't apply
> conditional formatting and use the optimized writer.
>
> I imagine this is working as intended, offering more features taking up
> more memory. Or maybe I'm just doing it wrong?
Some more on this. I've just done some work on memory optimisation and,
depending upon the file, I'm getting similar numbers in 2.3b2 as with
previous versions. On a simple file with just numbers memory use should be
less than 2.0. Using the memory_profiler module I get similar numbers to
what the OS (Mac OS 10.10.5, 64-bit) reports. 2.2 certainly does use the
most memory but I don't see anything like as much variance as you are
between versions.
2.2 and 2.3 have a lot more functionality and should, in general, be
faster than previous versions. I do have one file where 2.2 is turning out
to be by far the fastest and with the lowest memory footprint. This will
require more investigation but I suspect it may be related to the new
tokeniser expanding lots of shared formulae.
Some profiling obtained using the following script:
import time
from memory_profiler import memory_usage
from openpyxl import load_workbook
fname = "openpyxl/benchmarks/files/very_large.xlsx" # has 1 million cells
def pixel():
start = time.clock()
wb = load_workbook(fname)
end = time.clock()
print("openpyxl {:0.1f} s".format(end - start))
use = memory_usage(proc=-1, interval=1)[0]
print("Memory use %.1f MB" % use)
from pympler.muppy import print_summary
print_summary()
if __name__ == '__main__':
from openpyxl import __version__
print()
print(__version__)
pixel()
2.0.5
openpyxl 37.5 s
Memory use 447.1 MB
types | # objects | total size
================================= | =========== | ============
<class 'openpyxl.cell.cell.Cell | 1000000 | 106.81 MB
<class 'str | 1016954 | 50.63 MB
<class 'dict | 2196 | 49.91 MB
<class 'int | 1491119 | 39.82 MB
<class 'type | 716 | 682.81 KB
<class 'code | 4659 | 655.37 KB
<class '_io.BufferedWriter | 2 | 256.34 KB
<class 'collections.OrderedDict | 8 | 254.03 KB
<class 'set | 425 | 211.47 KB
<class 'list | 1607 | 168.67 KB
<class '_io.BufferedReader | 1 | 128.17 KB
<class 'tuple | 1905 | 127.47 KB
<class 'weakref | 1389 | 108.52 KB
<class 'wrapper_descriptor | 1275 | 99.61 KB
<class 'method_descriptor | 1171 | 82.34 KB
2.1.5
openpyxl 40.2 s
Memory use 510.6 MB
types | # objects |
total size
======================================================= | =========== |
============
<class 'openpyxl.cell.cell.Cell | 1000000 |
114.44 MB
<class 'str | 2017245 |
102.32 MB
<class 'dict | 4248
| 50.32 MB
<class 'int | 1491148
| 39.82 MB
<class 'type | 726 |
692.72 KB
<class 'code | 4780 |
672.40 KB
<class '_io.BufferedWriter | 2 |
256.34 KB
<class 'openpyxl.worksheet.dimensions.DimensionHolder | 1 |
244.98 KB
<class 'set | 439 |
226.53 KB
<class 'list | 1614 |
169.23 KB
<class 'tuple | 1953 |
130.26 KB
<class '_io.BufferedReader | 1 |
128.17 KB
<class 'weakref | 1382 |
107.97 KB
<class 'wrapper_descriptor | 1275
| 99.61 KB
<class 'method_descriptor | 1171
| 82.34 KB
2.2.6
openpyxl 38.1 s
Memory use 515.6 MB
types | # objects |
total size
======================================================= | =========== |
============
<class 'openpyxl.cell.cell.Cell | 1000000 |
175.48 MB
<class 'str | 1989515 |
101.00 MB
<class 'dict | 2339
| 49.91 MB
<class 'int | 1490160
| 39.80 MB
<class 'type | 729 |
695.42 KB
<class 'code | 4845 |
681.54 KB
<class '_io.BufferedWriter | 2 |
256.34 KB
<class 'openpyxl.worksheet.dimensions.DimensionHolder | 1 |
244.98 KB
<class 'set | 451 |
231.16 KB
<class 'tuple | 2006 |
133.63 KB
<class '_io.BufferedReader | 1 |
128.17 KB
<class 'openpyxl.worksheet.dimensions.ColumnDimension | 1001 |
125.12 KB
<class 'openpyxl.worksheet.dimensions.RowDimension | 1001 |
125.12 KB
<class 'weakref | 1400 |
109.38 KB
<class 'wrapper_descriptor | 1275
| 99.61 KB
2.3.0-b2
openpyxl 33.5 s
Memory use 399.8 MB
types | # objects | total
size
=============================================== | =========== |
============
<class 'openpyxl.cell.cell.Cell | 1000000 | 106.81
MB
<class 'dict | 3108 | 50.16
MB
<class 'int | 1490351 | 39.80
MB
<class 'str | 17566 | 1.82
MB
<class 'type | 709 | 676.48
KB
<class 'code | 4792 | 674.09
KB
<class 'set | 545 | 304.72
KB
<class '_io.BufferedWriter | 2 | 256.34
KB
<class 'openpyxl.descriptors.MetaSerialisable | 230 | 220.56
KB
<class 'tuple | 2425 | 162.29
KB
<class '_io.BufferedReader | 2 | 128.34
KB
<class 'weakref | 1576 | 123.12
KB
<class 'wrapper_descriptor | 1275 | 99.61
KB
<class 'method_descriptor | 1171 | 82.34
KB
<class 'list | 622 | 77.70
KB