Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Exponential time increase in garbage collection

0 views
Skip to first unread message

Sean McGrath

unread,
May 18, 2002, 5:52:19 AM5/18/02
to
All,

I have a program that creates lots of small objects (actually some large
objects
that are made up of many sub-objects) and retrieves them through
a dictionary.

Depending on the machine the program is run on, there comes a point where
deleting them from the dictionary takes a long time. Plotting the numbers
shows a relationship which is distinctly exponential in shape.

Machine A:
Objects Dictionary Delete Time
600 2
200 18
1000 97
1200 344

Machine B:
Objects Dictionary Delete Time
25 0
50 5
75 14
100 37

The behavior is the same on Windows and Unix. The numbers change but the
underlying shape
of the graph stays the same.

Any lore for dealing with this type of situation greatly appreciated.

regards,
Sean McGrath


Aahz

unread,
May 18, 2002, 10:29:04 AM5/18/02
to
In article <mailman.1021716090...@python.org>,

Sean McGrath <sean.m...@propylon.com> wrote:
>
>Depending on the machine the program is run on, there comes a point where
>deleting them from the dictionary takes a long time. Plotting the numbers
>shows a relationship which is distinctly exponential in shape.

Yup. This is being worked on for Python 2.3, for the case where the
tracked objects are tuples. If it's not tuples causing the problem for
you, you'll need to manually control gc (or simply disable it if you
know that you don't have any cycles).
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"In the end, outside of spy agencies, people are far too trusting and
willing to help." --Ira Winkler

Tim Peters

unread,
May 18, 2002, 10:48:16 AM5/18/02
to
[Sean McGrath]

> I have a program that creates lots of small objects (actually some
> large objects that are made up of many sub-objects) and retrieves
> them through a dictionary.
>
> Depending on the machine the program is run on, there comes a point
> where deleting them from the dictionary takes a long time.

Sorry, this is too vague to work with. Start with which verion of Python
you're using, and try to come up with a self-contained small test case
exhibiting the symptom.

> Plotting the numbers shows a relationship which is distinctly
> exponential in shape.
>

> Machine A:
> Objects Dictionary Delete Time
> 600 2
> 200 18
> 1000 97
> 1200 344
>
> Machine B:
> Objects Dictionary Delete Time
> 25 0
> 50 5
> 75 14
> 100 37

If I had any idea what you were doing, this *might* help <wink>.

BTW, why did you mention "garbage collection" in the subject line? Does the
problem go away if you do gc.disable() at the start?

> The behavior is the same on Windows and Unix.

At the start, you said this happens "depending on the machine". So what
varies besides Windows-vs-Unix that *does* make a difference? Also be
specific aobut which flavors of Windows and Unix you're using (different
flavors of WIndows have radically different memory behavior, and ditto
Unix).

> The numbers change but the underlying shape of the graph stays
> the same.
>
> Any lore for dealing with this type of situation greatly appreciated.

If you could try it under current CVS Python, that might be interesting --
pymalloc is enabled by default in the current CVS build, and several times
pymalloc has cured cases where the *platform* malloc/free exhibit
quadratic-time behavior.

Sean McGrath

unread,
May 18, 2002, 1:08:43 PM5/18/02
to
[Tim Peters]

>Sorry, this is too vague to work with. Start with which verion of Python
>you're using,

Python 1.5.2 but I've tried with 2.1.3 as well and get the same result.

>and try to come up with a self-contained small test case
>exhibiting the symptom.

System in question runs to many thousands of lines of code
so getting a standalone test case is proving interesting but
work continues apace to so do.

>> The behavior is the same on Windows and Unix.

>At the start, you said this happens "depending on the machine". So what
>varies besides Windows-vs-Unix that *does* make a difference? Also be
>specific aobut which flavors of Windows and Unix you're using (different
>flavors of WIndows have radically different memory behavior, and ditto
>Unix).

Sorry, I meant that the absolute numbers vary from machine to machine but the
underlying problem is the same. Windows 2000, NT and Solaris 2.6.

I have spent the day investigating this and the problem seems to
be to do with cPickle!

I have pickled versions of my, reasonably complex objects.
If I load, say 200 of these from their pickles, deleting them
takes quite a long time. If however, I create 200 duplicate
objects using copy.deepcopy(), the deleting time
is significantly faster.

It does not matter how I store references to the objects - dictionaries,
lists, variable names.
It seems that whenever they are finalised - even if that is triggered by
dropping off
the end of the program, the long wait starts...

In my last test, 140 objects read from pickles took 323 seconds to delete.
By switching to deepcopy() it dropped to 45 seconds - about 7 times faster.

Does this scenario ring any bells with anyone?

regards,
Sean

Neil Schemenauer

unread,
May 18, 2002, 5:38:38 PM5/18/02
to
Why you think the time is spent in GC? It sounds like the malloc
implementation is trying to defragment memory on free(). The current GC
has quadratic cost while allocating objects but it does not for
deallocating objects. Can you try adding "gc.disable()" to the
beginning of your program and see if the problem goes away?

Neil


Sean McGrath

unread,
May 19, 2002, 5:47:20 AM5/19/02
to

I'm seeing curious figures (included below). Basically,
Python 1.5.2 exhibits a big difference in deletion time for
objects loaded via cPickle as opposed to pickle.py

enabling/disabling gc with Python 2.1.3 does not make any
appreciable difference. However, delete performance in
2.1.3 seems significantly worse than what I'm seeing
with Python 1.5.2 (30 seconds against 5 seconds).

I know that the ideal situation here would be a standalone
test case that I could post but I'm struggling with that
one on a number of fronts:-)

The figures I'm seeing:-

Python 2.1.3
load objects with pickle module:
gc enabled:
Load Time : 63
Delete Time : 31
gc disabled:
Load Tine : 63
Delete Time : 32
load objects with cPickle module:
gc enabled:
Load Time : 42
Delete Time : 32
gc disabled:
Load Time : 40
Delete Time : 32
Python 1.5.2:
load objects with pickle module:
Load Time : 80
Delete Time : 5 (!!!!!!!!!!!!!!!!)
load with cPickle module:
Load Time : 40
Delete Time : 43

Has the pickle format changed from 1.5.2 to 2.1.3? Is the fact that the pickles
where created with cPickle 1.5.2 skewing things I wonder?

regards,
Sean


Martin v. Loewis

unread,
May 19, 2002, 7:13:38 AM5/19/02
to
Sean McGrath <sean.m...@propylon.com> writes:

> I'm seeing curious figures (included below). Basically,
> Python 1.5.2 exhibits a big difference in deletion time for
> objects loaded via cPickle as opposed to pickle.py

My guess is that there is some large dictionary somewhere that is
cleared with a foolish algorithm such as

k = d.keys()
while k:
del d[k[k]]
del k[0]

I believe that this algorithm has performance characteristics similar
to the one you describe.

Regards,
Martin

0 new messages