Is this a bug? Python intermittently stops dead for seconds

charlie strauss

unread,

Oct 1, 2006, 2:52:13 AM10/1/06

to pytho...@python.org

Below is a simple program that will cause python to intermittently
stop executing for a few seconds. it's 100% reproducible on my machine.

I'd be tempted to say this is a nasty garbage collection performance
issue except that there is no major memory to be garbage collected in
this script. I'd be tempted to say it was a unix virtual memory
issue except this is occuring at around 1/5th of my physical memory
size. So something else unexplained is going on

Class Foo instances create and hold a list of size nfoo of integers.
(nfoo =10)

Class Bar instances create and hold a list of size nbar of Foo
objects. (nbar =100)

When the code runs it starts creating and storing Bar objects in a
list while watching for real-time glitches in how long it takes to
create the next Foo object. If this is longer than 1/2 of a second
then it reports it.

On my computer after creating 1500 Bar objects, the rate of
creation of new Foo suddenly has a periodic glitch. This glitch re-
occurs about every 300 Bar Objects, and the duration of the glitch
keeps getting longer--growing to many seconds!!!!

Platform: 800Mhz powermac g 4 1Gb of memory
python: python 2.4.2

Note: since I an using absolute time threshold for reporting the
glitches, the first one may take more iterations before it occurs on
a fast computer. You may need to increase nbar or nfoo.

import sys
from time import time

class Foo(object):
def __init__(me,nfoo):
me.memory = [1]*nfoo

class Bar(object):
def __init__(me,nbar,nfoo):
tgx.set_tag("Bar") # timer

b = [None]*nbar
for i in xrange(nbar):
b[i]=Foo(nfoo) # make a foo, add it to list
tgx.jump("Bar"+`i`) #timer

me.b = b #store the list in my instance memory

# just a utility class to time things.
class gtime:
def __init__(me,f=sys.stderr):
me.last = time()
me.f=f
me.tag = ""
me.ticks = 0

def set_tag(me,tag):
me.tag = tag
me.ticks = 0

def reset(me):
me.ticks = 0

def jump(me,tag="NONE",allowed_gap=0.5):
me.t = time()
me.ticks +=1
if me.t-me.last>allowed_gap:
print >>me.f,"Big Gap:",me.t-me.last,"seconds
",me.tag,tag,me.ticks
me.last = time()

tgx = gtime() # instance of the timer

# main loop
nbar = 100
nfoo = 10

ad_nauseum = 20000
final = [None]*ad_nauseum

for i in xrange(ad_nauseum ):
if i%100 == 0 :print >>sys.stderr,"Bars Made: ",i
final[i] = Bar(nbar,nfoo)

sample Output:

Bars Made: 0
Bars Made: 100
Bars Made: 200
Bars Made: 300
Bars Made: 400
Bars Made: 500
Bars Made: 600
Bars Made: 700
Bars Made: 800
Bars Made: 900
Bars Made: 1000
Bars Made: 1100
Bars Made: 1200
Bars Made: 1300
Bars Made: 1400
Bars Made: 1500
Big Gap: 0.618862867355 seconds Bar Bar5 6
Bars Made: 1600
Bars Made: 1700
Bars Made: 1800
Big Gap: 0.748915195465 seconds Bar Bar76 77
Bars Made: 1900
Bars Made: 2000
Bars Made: 2100
Big Gap: 0.809149980545 seconds Bar Bar45 46
Bars Made: 2200
Bars Made: 2300
Bars Made: 2400
Big Gap: 0.898494958878 seconds Bar Bar15 16
Bars Made: 2500
Bars Made: 2600
Bars Made: 2700
Big Gap: 1.01110696793 seconds Bar Bar86 87
Bars Made: 2800
Bars Made: 2900
Bars Made: 3000
Big Gap: 1.12396192551 seconds Bar Bar55 56
Bars Made: 3100
Bars Made: 3200
Bars Made: 3300
Big Gap: 1.39006495476 seconds Bar Bar25 26
Bars Made: 3400
Bars Made: 3500
Bars Made: 3600
Big Gap: 1.55699706078 seconds Bar Bar96 97
Bars Made: 3700
Bars Made: 3800
Bars Made: 3900
Big Gap: 1.49929594994 seconds Bar Bar65 66
Bars Made: 4000
Bars Made: 4100
Bars Made: 4200
Big Gap: 1.64840602875 seconds Bar Bar35 36
Bars Made: 4300
Bars Made: 4400
Bars Made: 4500
Big Gap: 1.728484869 seconds Bar Bar5 6
Bars Made: 4600
Bars Made: 4700
Bars Made: 4800

Giovanni Bajo

unread,

Oct 1, 2006, 3:13:25 AM10/1/06

to

charlie strauss wrote:

> Below is a simple program that will cause python to intermittently
> stop executing for a few seconds. it's 100% reproducible on my
> machine.

Confirmed with Python 2.4.2 on Windows.

gc.disable() fixes it, so it looks like you found an inefficiency in the
Python's GC. I have no idea whether this would be considered a bug by Python's
developer, but you can try opening a bugreport...
--
Giovanni Bajo

John Machin

unread,

Oct 1, 2006, 4:09:47 AM10/1/06

to

Reproduced on 2.4.3 and 2.5 on Windows.
Disabling GC fixes the speed problem as Giovanni said, but doesn't
reduce the amount of memory taken (measured by increase in "page file
used" display in Task Manager). At about 520 Mb, this seems rather too
high to me.

Definitely worth reporting, even if the outcome is only(!) a timbottian
dissertation of why it's not a problem -- at least we'd learn something
:-)

Cheers,
John

Daniel Nogradi

unread,

Oct 1, 2006, 4:37:35 AM10/1/06

to pytho...@python.org

This is because the OP violated the Style Guide (PEP 8) and used 'me'
instead of 'self' as the first argument of instance methods, the
couple of seconds delay in runtime is the revenge of the vicious
interpreter :)

Jorgen Grahn

unread,

Oct 1, 2006, 4:41:06 AM10/1/06

to

On Sun, 01 Oct 2006 07:13:25 GMT, Giovanni Bajo <no...@sorry.com> wrote:
> charlie strauss wrote:
>
>> Below is a simple program that will cause python to intermittently
>> stop executing for a few seconds. it's 100% reproducible on my
>> machine.
>
> Confirmed with Python 2.4.2 on Windows.

And Python 2.3.5 on Linux, amd64. In fact, it causes heavy swapping so it
will disrupt other processes, too.

I didn't read the code, stupid as I am, but I trust that the behavior
doesn't match what the code actually tries to do.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org> R'lyeh wgah'nagl fhtagn!

charlie strauss

unread,

Oct 1, 2006, 10:33:07 AM10/1/06

to Steve Holden, pytho...@python.org

Steve and other good folks who replied:

I want to clarify that, on my computer, the first instance of the gap occurs way before the memory if filled. (at about 20% of physical ram). Additionally the process monitor shows no page faults.

Yes if you let the as-written demo program run to completetion (all 20,000 iterations) then on many computers it would not be surprising that your computer eventually goes into forced page swapping at some point. That would be expected and is not the issue than the one I am concerned with.

in my case starts glicthing at around iteration 1000.

1000(bars) x 100(foos)x(10 integers in array)

is nominally
100,000 class objects and
1,000,000 array elements.

(note that the array if filled as [1]*10, so there is actually only one "integer", but 10 array elements refering to it, per foo class.)

However steve may have put his finger on the reason why the duration grows with time. Here is my current hypothesis. The design of the program does not have and points where significant amounts of memory are released: all objects have held references till the end. But prehaps there are some implicitly created objects of the same size created along the way??? For example when I write

me.memory = [1]*nfoo

perhaps internally, python is allocating an array of size foo and then __copying__ it into me.memory??? Since there is no reference to the intermediate it would then marked for future garbage collection.

If that were true then the memory would have interleaved entities of things to GC and things with references held in me.memory.

Then to remove these would require GC to scan the entire set of existing objects, which is growing.

Turning off GC would prevent this.

In any case I don't think what I'm doing is very unusual. The original program that trigger my investigation of the bug was doing this:

foo was an election ballot holding 10 outcomes, and bar was a set of 100 ballots from 100 voting machines, and the total array was holding the ballot sets from a few thousand voting machines.

Almost any inventory program is likely to have such a simmilar set of nested array, so it hardly seems unusual.

charlie strauss

unread,

Oct 1, 2006, 10:43:58 AM10/1/06

to charlie strauss, Steve Holden, pytho...@python.org

By the way if you are on a fast computer, and an OS whose time.time() function can resolve less than 0.5 seconds then you can see this problem on your machine at lower memory utilizations by changing the value of the default "allowed_gap" in the gtime class from 0.5 seconds down to say 0.1 second.
This is the threshold for which the computer program flags the time it takes to create a "foo" object. on a fast computer it should take much less than 0.1 sec.

>--
>http://mail.python.org/mailman/listinfo/python-list

Steve Holden

unread,

Oct 1, 2006, 10:48:12 AM10/1/06

to pytho...@python.org

charlie strauss wrote:
> Steve and other good folks who replied:
>
> I want to clarify that, on my computer, the first instance of the gap occurs way before the memory if filled. (at about 20% of physical ram). Additionally the process monitor shows no page faults.
>
> Yes if you let the as-written demo program run to completetion (all 20,000 iterations) then on many computers it would not be surprising that your computer eventually goes into forced page swapping at some point. That would be expected and is not the issue than the one I am concerned with.
>
> in my case starts glicthing at around iteration 1000.
>
> 1000(bars) x 100(foos)x(10 integers in array)
>
> is nominally
> 100,000 class objects and
> 1,000,000 array elements.
>
> (note that the array if filled as [1]*10, so there is actually only one "integer", but 10 array elements refering to it, per foo class.)
>
>
> However steve may have put his finger on the reason why the duration grows with time. Here is my current hypothesis. The design of the program does not have and points where significant amounts of memory are released: all objects have held references till the end. But prehaps there are some implicitly created objects of the same size created along the way??? For example when I write
>
> me.memory = [1]*nfoo
>
> perhaps internally, python is allocating an array of size foo and then __copying__ it into me.memory??? Since there is no reference to the intermediate it would then marked for future garbage collection.
>
> If that were true then the memory would have interleaved entities of things to GC and things with references held in me.memory.
>
> Then to remove these would require GC to scan the entire set of existing objects, which is growing.
>
> Turning off GC would prevent this.
>
>
> In any case I don't think what I'm doing is very unusual. The original program that trigger my investigation of the bug was doing this:
>
> foo was an election ballot holding 10 outcomes, and bar was a set of 100 ballots from 100 voting machines, and the total array was holding the ballot sets from a few thousand voting machines.
>
> Almost any inventory program is likely to have such a simmilar set of nested array, so it hardly seems unusual.
>
>
>
>
>

I think the point you are missing is that the garbage collector is
triggered from time to time to ensure that no cyclical garbage remains
uncollected, IIRC. The more data that's been allocated, the longer it
takes the collector to scan all of memory to do its job.

If you can find a way to avoid the behaviour I'm sure the development
team would be interested to hear it :-)

I think you'll find that most programs that eat through memory in this
way will exhibit pretty much the same behaviour. If you *know* your
program isn't creating data cycles, just turn the GC off and rely on
reference counting. That won't save you from paging when you eventually
exhaust physical memory. Nothing can.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

charlie strauss

unread,

Oct 1, 2006, 11:22:59 AM10/1/06

to pytho...@python.org

>>
>I think the point you are missing is that the garbage collector is
>triggered from time to time to ensure that no cyclical garbage remains
>uncollected, IIRC. The more data that's been allocated, the longer it
>takes the collector to scan all of memory to do its job.
>
>If you can find a way to avoid the behaviour I'm sure the development
>team would be interested to hear it :-)

>
>I think you'll find that most programs that eat through memory in this
>way will exhibit pretty much the same behaviour. If you *know* your
>program isn't creating data cycles, just turn the GC off and rely on
>reference counting. That won't save you from paging when you eventually
>exhaust physical memory. Nothing can.

Could you clarify that for me. GC really has three components two it: 1) finding and freeing unrefernced memory by refer counts 2) cycle removal and 3) defragementing the storage stack. If I turn off GC, don't I lose all of these?

>From a user perspective, turning off GC and then managing it yourself is unappealing. What would be preferrable would be to be able to simply turn down it's threshold. That is, what I really want is to tell GC it can hold off and checks other than reference counts until X% of the memory is filled. At some point I want it to kick in, and I don't want to have to programatically manage that, but simply specify a simple tolerance.

Even better , I'd like to keep 1 and 3 and turn off just 2 and just use weak reference in the few cases I really need them.

charlie strauss

unread,

Oct 1, 2006, 11:37:24 AM10/1/06

to Steve Holden, pytho...@python.org

Steve, digging into the gc docs a bit more, I think the behaviour I am seeing is still not expected. Namely, the program I offered has no obvious place where objects are deallocated. The way GC is supposed to work is thate there are three levels of objects

level0: newly created objects
level1: objects that survived 1 round of garbage collection
level2: objects that survivied 2+ rounds of gargbage collection

Since all of my numerous objects are level2 objects, and none of them are every deallocated, then I should never trip the GC for these.

Your explanation would require this to be tripped so I can't explain it. For your explanation to be correct then there as to be some non-obvious step in the program that is deallocating level2 items in sufficient numbers to trip the GC.

Fredrik Lundh

unread,

Oct 1, 2006, 11:46:57 AM10/1/06

to pytho...@python.org

charlie strauss wrote:

> Could you clarify that for me. GC really has three components
> two it: 1) finding and freeing unrefernced memory by refer
> refer counts 2) cycle removal and 3) defragementing the
> storage stack. If I turn off GC, don't I lose all of these?

CPython always does (1), only does (2) if cycle-breaking GC isn't
disabled, and never does (3).

in your case, it's (2) that takes more and more time, simply because
you're creating tons of non-trivial objects. to see what's going on in
there, add

import gc
gc.set_debug(gc.DEBUG_STATS)

to the top of your program, and look at the status messages that appear
just before each "Big Gap" message.

</F>

Fredrik Lundh

unread,

Oct 1, 2006, 11:48:38 AM10/1/06

to pytho...@python.org

charlie strauss wrote:

> level0: newly created objects
> level1: objects that survived 1 round of garbage collection
> level2: objects that survivied 2+ rounds of gargbage collection
>
> Since all of my numerous objects are level2 objects, and none of
> them are every deallocated, then I should never trip the GC for
> these.

your understanding of generational GC is a bit fuzzy, it seems. that an
object is promoted to a higher level means that it's not inspected quite
as often as lower-level objects, not that it's never inspected at all.

</F>

Steve Holden

unread,

Oct 1, 2006, 12:13:19 PM10/1/06

to charlie strauss, pytho...@python.org

charlie strauss wrote:
> Steve, digging into the gc docs a bit more, I think the behaviour I am seeing is still not expected. Namely, the program I offered has no obvious place where objects are deallocated. The way GC is supposed to work is thate there are three levels of objects
>

> level0: newly created objects
> level1: objects that survived 1 round of garbage collection
> level2: objects that survivied 2+ rounds of gargbage collection
>
> Since all of my numerous objects are level2 objects, and none of them are every deallocated, then I should never trip the GC for these.
>

> Your explanation would require this to be tripped so I can't explain it. For your explanation to be correct then there as to be some non-obvious step in the program that is deallocating level2 items in sufficient numbers to trip the GC.
>

So perhaps you can explain why switching garbage collection off changes
program behaviour? If you read the documentation more carefully you will
see that the collector merely scans generations 1 and 2 less frequently
that generation 0.

Charlie Strauss

unread,

Oct 1, 2006, 12:19:45 PM10/1/06

to pytho...@python.org

On Oct 1, 2006, at 9:48 AM, Fredrik Lundh wrote:

> charlie strauss wrote:
>
>> level0: newly created objects
>> level1: objects that survived 1 round of garbage collection
>> level2: objects that survivied 2+ rounds of gargbage collection
>>
>> Since all of my numerous objects are level2 objects, and none of
>> them are every deallocated, then I should never trip the GC for
>> these.
>

> your understanding of generational GC is a bit fuzzy, it seems.
> that an
> object is promoted to a higher level means that it's not inspected
> quite
> as often as lower-level objects, not that it's never inspected at all.

As I understand it, level2 (and level1) objects only undergo gc when
more than 10 of them is deallocated. Since I never deallocate, this
should not be tripped right?

In any case regarding your other comments:

>> Could you clarify that for me. GC really has three components
>> two it: 1) finding and freeing unrefernced memory by refer
>> refer counts 2) cycle removal and 3) defragementing the
>> storage stack. If I turn off GC, don't I lose all of these?
>>
>
> CPython always does (1), only does (2) if cycle-breaking GC isn't
> disabled, and never does (3).

Never does 3? then how does it compact it's memory allocation area?
Surely it does not rely on the OS to manage every small object as a
separate memory allocation.

And just to be clear: are you saying that when I do a gc.disable this
only turns off 2 and not 1? The docs don't say that as far as I can
tell.

> in your case, it's (2) that takes more and more time, simply because
> you're creating tons of non-trivial objects. to see what's going
> on in
> there, add
>
> import gc
> gc.set_debug(gc.DEBUG_STATS)
>
> to the top of your program, and look at the status messages that
> appear
> just before each "Big Gap" message.

Could you be a bit more explicit. I'm new to the gc module. Sorry
to be slow but I don't know what "looking at the status message" means.

>
> </F>
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Fredrik Lundh

unread,

Oct 1, 2006, 12:36:00 PM10/1/06

to pytho...@python.org

Charlie Strauss wrote:

> Sorry to be slow but I don't know what "looking at the status
> message" means.

did you add

import gc
gc.set_debug(gc.DEBUG_STATS)

to the top of the program? (you need to run it to see the status
messages, of course)

</F>

Roel Schroeven

unread,

Oct 1, 2006, 12:48:22 PM10/1/06

to

Charlie Strauss schreef:

> On Oct 1, 2006, at 9:48 AM, Fredrik Lundh wrote:
>> charlie strauss wrote:
>>> Could you clarify that for me. GC really has three components
>>> two it: 1) finding and freeing unrefernced memory by refer
>>> refer counts 2) cycle removal and 3) defragementing the
>>> storage stack. If I turn off GC, don't I lose all of these?
>>>
>> CPython always does (1), only does (2) if cycle-breaking GC isn't
>> disabled, and never does (3).
>

[snip]

> And just to be clear: are you saying that when I do a gc.disable this
> only turns off 2 and not 1? The docs don't say that as far as I can
> tell.

AFAIK Python always does reference counting, and the garbage collector
is used only for more difficult cases. As the gc module docs say:
"Since the collector supplements the reference counting already used in
Python, you can disable the collector if you are sure your program does
not create reference cycles."

I don't know if that's only true for CPython or also for the other
implementations.

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven

Steve Holden

unread,

Oct 1, 2006, 1:11:25 PM10/1/06

to pytho...@python.org

Roel Schroeven wrote:
> Charlie Strauss schreef:
>
>>On Oct 1, 2006, at 9:48 AM, Fredrik Lundh wrote:
>>
>>>charlie strauss wrote:
>>>
>>>>Could you clarify that for me. GC really has three components
>>>>two it: 1) finding and freeing unrefernced memory by refer
>>>>refer counts 2) cycle removal and 3) defragementing the
>>>>storage stack. If I turn off GC, don't I lose all of these?
>>>>
>>>
>>>CPython always does (1), only does (2) if cycle-breaking GC isn't
>>>disabled, and never does (3).
>>
> [snip]
>
>>And just to be clear: are you saying that when I do a gc.disable this
>>only turns off 2 and not 1? The docs don't say that as far as I can
>>tell.
>
>
> AFAIK Python always does reference counting, and the garbage collector
> is used only for more difficult cases. As the gc module docs say:
> "Since the collector supplements the reference counting already used in
> Python, you can disable the collector if you are sure your program does
> not create reference cycles."
>
> I don't know if that's only true for CPython or also for the other
> implementations.
>

Read the documentation: the garbage collector is called regularly in
CPython.

Fredrik Lundh

unread,

Oct 1, 2006, 1:24:22 PM10/1/06

to pytho...@python.org

Roel Schroeven wrote:

> AFAIK Python always does reference counting, and the garbage collector
> is used only for more difficult cases. As the gc module docs say:
> "Since the collector supplements the reference counting already used in
> Python, you can disable the collector if you are sure your program does
> not create reference cycles."
>
> I don't know if that's only true for CPython or also for the other
> implementations.

CPython always uses reference counting, but that's not guaranteed by the
language specification:

"Objects are never explicitly destroyed; however, when they become
unreachable they may be garbage-collected. An implementation is
allowed to postpone garbage collection or omit it altogether -- it
is a matter of implementation quality how garbage collection is
implemented, as long as no objects are collected that are still
reachable."

http://pyref.infogami.com/objects

</F>

Giovanni Bajo

unread,

Oct 1, 2006, 1:40:35 PM10/1/06

to

Steve Holden wrote:

> I think you'll find that most programs that eat through memory in this
> way will exhibit pretty much the same behaviour. If you *know* your
> program isn't creating data cycles, just turn the GC off and rely on
> reference counting. That won't save you from paging when you
> eventually exhaust physical memory. Nothing can.

Even better, if you know that you're *creating* tons of objects without
creating many *discardable* cycles at the same, it's better to turn off GC
collection during loading, and only do a single pass (gc.collect()) when you
are done with the allocations.
--
Giovanni Bajo

Tim Peters

unread,

Oct 1, 2006, 4:12:11 PM10/1/06

to pytho...@python.org

[charlie strauss]

> Below is a simple program that will cause python to intermittently
> stop executing for a few seconds. it's 100% reproducible on my machine.

Any program that creates a great many long-lived container objects
will behave similarly during the creation phase. Others have
explained why. Here's a simpler example:

from time import time
xs = []
i = 0
while 1:
i += 1
s = time()
xs.append([[1, 2] for dummy in xrange(1000)])
f = time()
if f-s > 0.25:
print "time", f-s, "on try", i
if i % 100 == 0:
print "xs made:", i

Tim Peters

unread,

Oct 1, 2006, 4:16:04 PM10/1/06

to pytho...@python.org

[charlie strauss]

>>> Below is a simple program that will cause python to intermittently
>>> stop executing for a few seconds. it's 100% reproducible on my
>>> machine.

[Giovanni Bajo]

>> Confirmed with Python 2.4.2 on Windows.

[Jorgen Grahn]

> And Python 2.3.5 on Linux, amd64. In fact, it causes heavy swapping so it
> will disrupt other processes, too.
>
> I didn't read the code, stupid as I am, but I trust that the behavior
> doesn't match what the code actually tries to do.

No, that's what it does: as it goes on, it creates an ever-increasing
and unbounded number of immortal objects and lists. The "time burps"
merely reflect that the more live containers and objects you have, the
longer it takes for cyclic gc to determine that they're not trash.

Tim Peters

unread,

Oct 1, 2006, 4:17:38 PM10/1/06

to pytho...@python.org

[Steve Holden, "pins the blame" for pauses on periodic cyclic gc]
> ...
> So basically what you have here is a pathological example of why it's
> sometimes wise to disable garbage collection. Tim, did I miss anything?

Nope!

Tim Peters

unread,

Oct 1, 2006, 4:28:57 PM10/1/06

to pytho...@python.org

[charlie strauss]

> I want to clarify that, on my computer, the first instance of the gap occurs way
> before the memory if filled. (at about 20% of physical ram). Additionally the
> process monitor shows no page faults.

Python has no idea of how much RAM you have, or even of how much RAM
it's using. See the `gc` module docs, function set_threshold(), for a
description of when Python decides to run a cyclic gc pass. That's
all about the current excess (if any) of the number of container
allocations over the number of container deallocations since the last
time cyclic gc ran.

> ...

> (note that the array if filled as [1]*10, so there is actually only one "integer",
> but 10 array elements refering to it, per foo class.)

Nevertheless cyclic gc has to examine all 10 array elements each time
the list is scanned.

> ...

> For example when I write
>
> me.memory = [1]*nfoo
>
> perhaps internally, python is allocating an array of size foo and then __copying__ it
> into me.memory???

No. Assignments in Python never copy anything.

> Since there is no reference to the intermediate it would then marked for future
> garbage collection.

Nothing in Python is "marked for future garbage collection". From
time to time /all/ container objects are scanned to determine whether
any have become cyclic trash. This takes time proportional to the
total number of containees across all container objects.

...

> foo was an election ballot holding 10 outcomes, and bar was a set of 100 ballots
> from 100 voting machines, and the total array was holding the ballot sets from a few
> thousand voting machines.
>
> Almost any inventory program is likely to have such a simmilar set of nested array,
> so it hardly seems unusual.

For memory-consumption reasons alone, a serious <wink> such program is
likely to use array.array objects at leaf level instead. array.array
objects hold raw bits, not Python objects. Since they don't hold
Python objects, they can't possibly be part of a cycle, so cyclic gc
doesn't need to scan array.array contents either.

Tim Peters

unread,

Oct 1, 2006, 4:37:40 PM10/1/06

to charlie strauss, pytho...@python.org

[charlie strauss]

> Steve, digging into the gc docs a bit more, I think the behaviour I am seeing is still
> not expected. Namely, the program I offered has no obvious place where objects
> are deallocated. The way GC is supposed to work is thate there are three levels of
> objects
>

> level0: newly created objects
> level1: objects that survived 1 round of garbage collection
> level2: objects that survivied 2+ rounds of gargbage collection

Yes.

> Since all of my numerous objects are level2 objects,

No. All newly created objects start in level0. Over time, most (but
never all) of your objects end up in level 2.

> and none of them are every deallocated, then I should never trip the GC for these.

Cyclic gc scans /all/ container objects at or under level N, whenever
cyclic gc runs. N varies from run to run of cyclic gc, according to
the scheme described in the docs for the gc.set_threshold() function.
N=2 is certainly a possible value. There is never a time when a live
container object becomes exempt from all future runs of cyclic gc. If
a live container object survives two rounds of cyclic gc, it ends up
lin level2. That doesn't mean it's never looked at again, it just
means it's not looked at during level0 or level1 (N=0 or N=1) runs of
cyclic gc. It will still be looked at during all level2 (N=2) runs of
cyclic gc.

> Your explanation would require this to be tripped so I can't explain it. For your
> explanation to be correct then there as to be some non-obvious step in the program
> that is deallocating level2 items in sufficient numbers to trip the GC.

Deallocations never trigger cyclic gc. As the docs say, cyclic gc is
triggered by an /excess/ of allocations /over/ deallocations. So,
e.g., if you delete container objects just as fast as you create them,
cyclic gc will never run. But that's not what you're doing. Instead
you're allocating new objects but never deallocating them. That makes
cyclic gc run as frequently as it's possible for it to run.

Tim Peters

unread,

Oct 1, 2006, 4:49:33 PM10/1/06

to pytho...@python.org

[Charlie Strauss]

>>> level0: newly created objects
>>> level1: objects that survived 1 round of garbage collection
>>> level2: objects that survivied 2+ rounds of gargbage collection
>>>

>>> Since all of my numerous objects are level2 objects, and none of

>>> them are every deallocated, then I should never trip the GC for
>>> these.

[Fredrik Lundh]

>> your understanding of generational GC is a bit fuzzy, it seems.
>> that an object is promoted to a higher level means that it's
>> not inspected quite as often as lower-level objects, not that it's
>> never inspected at all.

[Charlie]

> As I understand it, level2 (and level1) objects only undergo gc when
> more than 10 of them is deallocated. Since I never deallocate, this
> should not be tripped right?

No. Cyclic gc is triggered by an excess of allocations over deallocations.

> In any case regarding your other comments:

>>> Could you clarify that for me. GC really has three components

>>> two it: 1) finding and freeing unrefernced memory by refer
>>> refer counts 2) cycle removal and 3) defragementing the
>>> storage stack. If I turn off GC, don't I lose all of these?

>> CPython always does (1), only does (2) if cycle-breaking GC isn't
>> disabled, and never does (3).

> Never does 3?

Correct.

> then how does it compact it's memory allocation area?

It doesn't.

> Surely it does not rely on the OS to manage every small object as a
> separate memory allocation.

It doesn't do that either. Python has its own small-object allocator,
carving up 256KB chunks obtained from the system malloc(). It's based
on size-segregated "pools" with extremely low bookkeeping overhead,
and external fragmentation in small-object memory is essentially
non-existent because of that (although it's possible to concoct
pathological programs that exhibit it).

> And just to be clear: are you saying that when I do a gc.disable this
> only turns off 2 and not 1?

Right. Refcounting (#1) can never be disabled, and cyclic GC (#2) is
used only for trash objects that can't be reclaimed by #1 (meaning
container objects in cycles).