Am .01.2014, 23:00 Uhr, schrieb Iuri <
iuris...@gmail.com>:
> I tried your diff, it breaks my code.
> Traceback (most recent call last):
> File "a.py", line 10, in <module>
> for n, line in enumerate(w.iter_rows()):
> File "f:\projects\openpyxl\openpyxl\reader\iter_worksheet.py", line
> 220,
> in get_squared_range
> cell =
> cell._replace(internal_value=unicode(self._string_table[int(cell.internal_value)]))
> #pylint: disable-msg=W0212
> ValueError: invalid literal for int() with base 10: ''
That looks like a line break issue due to being in an e-mail. I ran some
tests locally and element.clear() has no effect. This is only to be
expected as element is overwritten by every time the loop runs.
> I also tested the element.clear inside the if block, it improved the
> memory
> usage, but it is not really fixed. I ran around 10k lines, the python
> process was using ~300MB. This memory usage works for me, but the 1.6.2
> limited the memory usage aroung 20MB.
> Before this patch, it was using ~2GB (!!) for 10k lines. With all the 30k
> lines, it was killed with ~5GB.
That is indeed excessive. I only have benchmark/reader.py to go on here.
But when I compare it with
https://bitbucket.org/ericgazoni/openpyxl/commits/1043f5c5cde6ccacf553632069e733386a1dd6f9
Then there is little or no difference. I've made no substantial changes to
that part of the code before then. This isn't to say that there aren't
differences between 1.6.2 and it, but I'm not really aware of them.
When I run benchmark/reader.py I have about 1.1 GB of memory in use when
not optimised and 900 MB when optimised.
> If you have more patches, I'm here to test it.
I can't think of anything easy at the moment and, based on my benchmarks,
I can only think that the memory use you're seeing is caused by other
parts of the code. Unfortunately, I have little experience with memory
optimisation in Python. Adam has written some docs on investigating use of
memory but we need to formalise it so that we can make changes based on
reproducible scenarios. How are you checking memory use? Do you have some
test files we could use?
As things stand I don't think it should stop a release of 1.8. I've
already done a lot of work in 1.9 to harmonise the interfaces for normal
and optimised readers so that future changes (such as data_only) are more
reliable and test isolation is higher. I'm also hoping to meet up with
Eric at FOSDEM at the end of the month and discuss things like this with
him.
Charlie