Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

memory recycling/garbage collecting problem

1 view
Skip to first unread message

Yuanxin Xi

unread,
Feb 17, 2009, 12:21:16 AM2/17/09
to pytho...@python.org
I'm having some problems with the memory recycling/garbage collecting
of the following testing code:

>>> a=[str(i) for i in xrange(10000000)]
This takes 635m/552m/2044 memory (VIRT/RES/SHR)

>>> b={}
>>> for i in xrange(10000000):
... b[str(i)]=i

Then the memory usage increased to 1726m/1.6g/2048

>>> del b
I expect the memory usage drop to the ammount before b was
created(635m/552m/2044), but it's actually 1341m/1.2g/2048

Could anyone please explain why this happens? It seems some memory
are not freed. I'm running into problems with this as my program is
very memory cosuming and need to frequently free some object to reuse
the memory. What is the best way to free the memory of b completely
(i.e. back to the status as b was created)? I'm using Python 2.5.2

Thanks.

Yuanxin

Chris Rebert

unread,
Feb 17, 2009, 12:31:25 AM2/17/09
to Yuanxin Xi, pytho...@python.org

My understanding is that for efficiency purposes Python hangs on to
the extra memory even after the object has been GC-ed and doesn't give
it back to the OS right away. This shouldn't matter to your program
though, as the "extra" memory will still be available for its use. And
assuming you're using CPython and a/b isn't referred to by anything
else, I believe they should be GC-ed immediately by the refcount
system after the 'del' s executed. So, in theory you shouldn't need to
worry about any of this, unless your example is not accurate and
you're dealing with cyclical references or C objects or something.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com

Marc 'BlackJack' Rintsch

unread,
Feb 17, 2009, 2:26:28 AM2/17/09
to
On Mon, 16 Feb 2009 23:21:16 -0600, Yuanxin Xi wrote:

> I'm having some problems with the memory recycling/garbage collecting of
> the following testing code:
>
>>>> a=[str(i) for i in xrange(10000000)]
> This takes 635m/552m/2044 memory (VIRT/RES/SHR)
>
>>>> b={}
>>>> for i in xrange(10000000):
> ... b[str(i)]=i
>
> Then the memory usage increased to 1726m/1.6g/2048
>
>>>> del b
> I expect the memory usage drop to the ammount before b was
> created(635m/552m/2044), but it's actually 1341m/1.2g/2048
>
> Could anyone please explain why this happens? It seems some memory are
> not freed.

It seems the memory is not given back to the operating system. This
doesn't mean that it is not freed by Python and can't be used again by
Python. Create the dictionary again and see if the memory usage rises
again or if it stays stable.

Ciao,
Marc 'BlackJack' Rintsch

Aaron Brady

unread,
Feb 17, 2009, 3:33:27 AM2/17/09
to
On Feb 16, 11:21 pm, Yuanxin Xi <xi11w...@gmail.com> wrote:
> I'm having some problems with the memory recycling/garbage collecting
> of the following testing code:
>
> >>> a=[str(i) for i in xrange(10000000)]
>
> This takes 635m/552m/2044 memory (VIRT/RES/SHR)
>
> >>> b={}
> >>> for i in xrange(10000000):
>
> ...     b[str(i)]=i
>
> Then the memory usage increased to 1726m/1.6g/2048
>
> >>> del b
>
> I expect the memory usage drop to the ammount before b was
> created(635m/552m/2044), but it's actually 1341m/1.2g/2048
snip

'gc.collect()' -- I believe, but I'm not the specialist in it.

Chris Rebert

unread,
Feb 17, 2009, 3:40:29 AM2/17/09
to Aaron Brady, pytho...@python.org

If I understand correctly, that only effects objects that are part of
a reference cycle and doesn't necessarily force the freed memory to be
released to the OS.

Tim Wintle

unread,
Feb 17, 2009, 9:10:26 AM2/17/09
to pytho...@python.org
On Tue, 2009-02-17 at 00:40 -0800, Chris Rebert wrote:
> >
> > 'gc.collect()' -- I believe, but I'm not the specialist in it.
>
> If I understand correctly, that only effects objects that are part of
> a reference cycle and doesn't necessarily force the freed memory to be
> released to the OS.

I believe that's correct.

If the OP is worrying about memory usage then they should also be aware
that there are lots of very clever things done pre-assigning and keeping
hold of memory with python's in-built types to let them scale well that
can be confusing when you're looking at memory usage.

Basically malloc() and free() are computationally expensive, so Python
tries to call them as little as possible - but it's quite clever at
knowing what to do - e.g. if a list has already grown large then python
assumes it might grow large again and keeps hold of a percentage of the
memory.

The outcome is that trying to reduce memory usage can change what data
structures you should use - tupples use less space than lists, etc.

Tim W

Floris Bruynooghe

unread,
Feb 17, 2009, 9:30:45 AM2/17/09
to
On Feb 17, 5:31 am, Chris Rebert <c...@rebertia.com> wrote:
> My understanding is that for efficiency purposes Python hangs on to
> the extra memory even after the object has been GC-ed and doesn't give
> it back to the OS right away.

Even if Python would free() the space no more used by it's own memory
allocator (PyMem_Malloc(), PyMem_Free() & Co) the OS usually doesn't
return this space to the global free memory pool but instead leaves it
assigned to the process, again for performance reasons. Only when the
OS is running out of memory it will go and get the free()ed memory of
processes back. There might be a way to force your OS to do so
earlier manually if you really want but I'm not sure how you'd do
that.

Regards
Floris

Christian Heimes

unread,
Feb 17, 2009, 10:59:33 AM2/17/09
to pytho...@python.org
Yuanxin Xi wrote:
> Could anyone please explain why this happens? It seems some memory
> are not freed. I'm running into problems with this as my program is
> very memory cosuming and need to frequently free some object to reuse
> the memory. What is the best way to free the memory of b completely
> (i.e. back to the status as b was created)? I'm using Python 2.5.2

Python uses malloc() and free() to allocate memory on the heap. Most
malloc() implementations don't give back memory to the system. Instead
the memory segment is still assigned to the process. In order to give
back memory to the system pool, a memory manager has to use mapped
memory (mmap()) instead of increasing the heap by changing the data
segment size with brk(). This isn't a Python flaw but a general issue
with malloc() based memory management. [1]

By the way Python has its own memory management system on top of the
system's malloc() system. The memory arena system is explained in great
detail in the file obmalloc.c [2].

Christian

[1] http://en.wikipedia.org/wiki/Malloc#Implementations
[2]
http://svn.python.org/view/python/branches/release25-maint/Objects/obmalloc.c?revision=65261&view=markup

Christian Heimes

unread,
Feb 17, 2009, 11:04:12 AM2/17/09
to pytho...@python.org
Tim Wintle wrote:
> Basically malloc() and free() are computationally expensive, so Python
> tries to call them as little as possible - but it's quite clever at
> knowing what to do - e.g. if a list has already grown large then python
> assumes it might grow large again and keeps hold of a percentage of the
> memory.

You are almost right. Python's mutable container types like have a
non-linear growth rate.

>From the file listobject.c

/*
This over-allocates proportional to the list size, making room
for additional growth. The over-allocation is mild, but is
enough to give linear-time amortized behavior over a long
sequence of appends() in the presence of a poorly-performing
system realloc().
The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

Tim Wintle

unread,
Feb 18, 2009, 7:19:04 AM2/18/09
to Christian Heimes, pytho...@python.org

Sorry, I think I didn't phrase myself very well - I was trying to
explain that de-allocation of memory follows different scaling behaviour
to allocation - so a large list that's shrunk is likely to take more
memory than a small list that's grown i.e. the part just above your
quote:


/*
Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}


it's all very clever stuff btw.


David Niergarth

unread,
Feb 26, 2009, 9:38:34 AM2/26/09
to
On Feb 16, 11:21 pm, Yuanxin Xi <xi11w...@gmail.com> wrote:
> Could anyone please explain why this happens?  It seems some memory
> are not freed.

There is a "bug" in versions of Python prior to 2.5 where memory
really isn't released back to the OS. Python 2.5 contains a new object
allocator that is able to return memory to the operating system that
fixes this issue. Here's an explanation:

http://evanjones.ca/python-memory-part3.html

What version of Python are you using? I have a machine running several
long-running processes, each of which occasionally spike up to 500M
memory usage, although normally they only require about 25M. Prior to
2.5, those processes never released that memory back to the OS and I
would need to periodically restart them. With 2.5, this is no longer a
problem. I don't always see memory usage drop back down immediately
but the OS does recover the memory eventually. Make sure you use 2.5
if this is an issue for you.

--David

David Niergarth

unread,
Feb 26, 2009, 10:00:34 AM2/26/09
to
Tim Peters showed a way to demonstrate the fix in

http://mail.python.org/pipermail/python-dev/2006-March/061991.html

> For simpler fun, run this silly little program, and look at memory
> consumption at the prompts:
>
> """
> x = []
> for i in xrange(1000000):
> x.append([])
> raw_input("full ")
> del x[:]
> raw_input("empty ")
> """
>
> For example, in a release build on WinXP, VM size is about 48MB at the
> "full" prompt, and drops to 3MB at the "empty" prompt. In the trunk
> (without this patch), VM size falls relatively little from what it is
> at the "full" prompt (the contiguous vector holding a million
> PyObject* pointers is freed, but the obmalloc arenas holding a
> million+1 list objects are never freed).
>
> For more info about the patch, see Evan's slides from _last_ year's PyCon:
>
> http://evanjones.ca/memory-allocator.pdf

I'm not sure what deleting a slice accomplishes (del x[:]); the
behavior is the same whether I do del x or del x[:]. Any ideas?

--David

Steve Holden

unread,
Feb 26, 2009, 11:30:41 AM2/26/09
to pytho...@python.org
del x removes the name x from the current namespace, garbage collecting
the object to which it referred. del x[:] leaves x referencing a cleared
list.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Terry Reedy

unread,
Feb 26, 2009, 1:22:12 PM2/26/09
to pytho...@python.org
Steve Holden wrote:
> del x removes the name x from the current namespace, garbage collecting
> the object to which it referred.

If there is another reference to the list, which there well might be in
an actual application with memory problems, then 'del x' only
disassociates the name but the object and its large contents are not gc'ed.

> del x[:] leaves x referencing a cleared list.

which is guaranteed to be cleared, regardless of other refs.

Terry

Steve Holden

unread,
Feb 26, 2009, 1:40:04 PM2/26/09
to pytho...@python.org
Terry Reedy wrote:
> Steve Holden wrote:
>> David Niergarth wrote:
[...]

>>> I'm not sure what deleting a slice accomplishes (del x[:]); the
>>> behavior is the same whether I do del x or del x[:]. Any ideas?
>>>
>> del x removes the name x from the current namespace, garbage collecting
>> the object to which it referred.
>
> If there is another reference to the list, which there well might be in
> an actual application with memory problems, then 'del x' only
> disassociates the name but the object and its large contents are not gc'ed.
>
>> del x[:] leaves x referencing a cleared list.
>
> which is guaranteed to be cleared, regardless of other refs.
>
Nice catch, Terry! You correctly spotted I was assuming that d
represented the only reference to the list.
0 new messages