I have a program that creates lots of small objects (actually some large
objects
that are made up of many sub-objects) and retrieves them through
a dictionary.
Depending on the machine the program is run on, there comes a point where
deleting them from the dictionary takes a long time. Plotting the numbers
shows a relationship which is distinctly exponential in shape.
Machine A:
Objects Dictionary Delete Time
600 2
200 18
1000 97
1200 344
Machine B:
Objects Dictionary Delete Time
25 0
50 5
75 14
100 37
The behavior is the same on Windows and Unix. The numbers change but the
underlying shape
of the graph stays the same.
Any lore for dealing with this type of situation greatly appreciated.
regards,
Sean McGrath
Yup. This is being worked on for Python 2.3, for the case where the
tracked objects are tuples. If it's not tuples causing the problem for
you, you'll need to manually control gc (or simply disable it if you
know that you don't have any cycles).
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"In the end, outside of spy agencies, people are far too trusting and
willing to help." --Ira Winkler
Sorry, this is too vague to work with. Start with which verion of Python
you're using, and try to come up with a self-contained small test case
exhibiting the symptom.
> Plotting the numbers shows a relationship which is distinctly
> exponential in shape.
>
> Machine A:
> Objects Dictionary Delete Time
> 600 2
> 200 18
> 1000 97
> 1200 344
>
> Machine B:
> Objects Dictionary Delete Time
> 25 0
> 50 5
> 75 14
> 100 37
If I had any idea what you were doing, this *might* help <wink>.
BTW, why did you mention "garbage collection" in the subject line? Does the
problem go away if you do gc.disable() at the start?
> The behavior is the same on Windows and Unix.
At the start, you said this happens "depending on the machine". So what
varies besides Windows-vs-Unix that *does* make a difference? Also be
specific aobut which flavors of Windows and Unix you're using (different
flavors of WIndows have radically different memory behavior, and ditto
Unix).
> The numbers change but the underlying shape of the graph stays
> the same.
>
> Any lore for dealing with this type of situation greatly appreciated.
If you could try it under current CVS Python, that might be interesting --
pymalloc is enabled by default in the current CVS build, and several times
pymalloc has cured cases where the *platform* malloc/free exhibit
quadratic-time behavior.
Python 1.5.2 but I've tried with 2.1.3 as well and get the same result.
>and try to come up with a self-contained small test case
>exhibiting the symptom.
System in question runs to many thousands of lines of code
so getting a standalone test case is proving interesting but
work continues apace to so do.
>> The behavior is the same on Windows and Unix.
>At the start, you said this happens "depending on the machine". So what
>varies besides Windows-vs-Unix that *does* make a difference? Also be
>specific aobut which flavors of Windows and Unix you're using (different
>flavors of WIndows have radically different memory behavior, and ditto
>Unix).
Sorry, I meant that the absolute numbers vary from machine to machine but the
underlying problem is the same. Windows 2000, NT and Solaris 2.6.
I have spent the day investigating this and the problem seems to
be to do with cPickle!
I have pickled versions of my, reasonably complex objects.
If I load, say 200 of these from their pickles, deleting them
takes quite a long time. If however, I create 200 duplicate
objects using copy.deepcopy(), the deleting time
is significantly faster.
It does not matter how I store references to the objects - dictionaries,
lists, variable names.
It seems that whenever they are finalised - even if that is triggered by
dropping off
the end of the program, the long wait starts...
In my last test, 140 objects read from pickles took 323 seconds to delete.
By switching to deepcopy() it dropped to 45 seconds - about 7 times faster.
Does this scenario ring any bells with anyone?
regards,
Sean
Neil
I'm seeing curious figures (included below). Basically,
Python 1.5.2 exhibits a big difference in deletion time for
objects loaded via cPickle as opposed to pickle.py
enabling/disabling gc with Python 2.1.3 does not make any
appreciable difference. However, delete performance in
2.1.3 seems significantly worse than what I'm seeing
with Python 1.5.2 (30 seconds against 5 seconds).
I know that the ideal situation here would be a standalone
test case that I could post but I'm struggling with that
one on a number of fronts:-)
The figures I'm seeing:-
Python 2.1.3
load objects with pickle module:
gc enabled:
Load Time : 63
Delete Time : 31
gc disabled:
Load Tine : 63
Delete Time : 32
load objects with cPickle module:
gc enabled:
Load Time : 42
Delete Time : 32
gc disabled:
Load Time : 40
Delete Time : 32
Python 1.5.2:
load objects with pickle module:
Load Time : 80
Delete Time : 5 (!!!!!!!!!!!!!!!!)
load with cPickle module:
Load Time : 40
Delete Time : 43
Has the pickle format changed from 1.5.2 to 2.1.3? Is the fact that the pickles
where created with cPickle 1.5.2 skewing things I wonder?
regards,
Sean
> I'm seeing curious figures (included below). Basically,
> Python 1.5.2 exhibits a big difference in deletion time for
> objects loaded via cPickle as opposed to pickle.py
My guess is that there is some large dictionary somewhere that is
cleared with a foolish algorithm such as
k = d.keys()
while k:
del d[k[k]]
del k[0]
I believe that this algorithm has performance characteristics similar
to the one you describe.
Regards,
Martin