Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Tree structure consuming lot of memory

0 views
Skip to first unread message

Chris Rebert

unread,
Jul 6, 2009, 6:03:00 AM7/6/09
to mayank gupta, pytho...@python.org
On Mon, Jul 6, 2009 at 2:55 AM, mayank gupta<moon...@gmail.com> wrote:
> Hi,
>
> I am creating a tree data-structure in python; with nodes of the tree
> created by a simple class :
>
> class Node :
>        def __init__(self , .... other attributes):
>               # initialise the attributes here!!
>
> But the problem is I am working with a huge tree (millions of nodes); and
> each node is consuming much more memory than it should. After a little
> analysis, I found out that in general it uses about 1.4 kb of memory for
> each node!!
> I will be grateful if someone could help me optimize the memory usage.

(1) Use __slots__ (see http://docs.python.org/reference/datamodel.html#slots)
(2) Use some data structure other than a tree
(3) Rewrite your Node/Tree implementation in C

Cheers,
Chris
--
http://blog.rebertia.com

Simon Forman

unread,
Jul 6, 2009, 1:15:22 PM7/6/09
to mayank gupta, pytho...@python.org
On Mon, Jul 6, 2009 at 6:12 AM, mayank gupta<moon...@gmail.com> wrote:
> Thanks for the other possibilites. I would consider option (2) and (3) to
> improve my code.
>
> But out of curiosity, I would still like to know why does an object of a
> Python-class consume "so" much of memory (1.4 kb), and this memory usage has
> nothing to do with its attributes.
>
> Thanks
>
> Regards.

For option 2 you should try using the built in types, list tuple or
dict. You might get better results.

I'm curious too as to why the class/instance code should take so much
memory, could you mention more about the code?

Antoine Pitrou

unread,
Jul 6, 2009, 3:58:26 PM7/6/09
to pytho...@python.org
mayank gupta <mooniitk <at> gmail.com> writes:
>
> After a little analysis, I found out that in general it uses about
> 1.4 kb of memory for each node!!

How did you measure memory use? Python objects are not very compact, but 1.4KB
per object seems a bit too much (I would expect more about 150-200 bytes/object
in 32-bit mode, or 300-400 bytes/object in 64-bit mode).

One of the solutions is to use __slots__ as already suggested. Another, which
will have similar benefits, is to use a namedtuple. Both suppress the instance
dictionnary (`instance`.__dict__), which is a major contributor to memory
consumption. Illustration (64-bit mode, by the way):

>>> import sys
>>> from collections import namedtuple

# First a normal class
>>> class Node(object): pass
...
>>> o = Node()
>>> o.value = 1
>>> o.children = ()
>>>
>>> sys.getsizeof(o)
64
>>> sys.getsizeof(o.__dict__)
280
# The object seems to take a mere 64 bytes, but the attribute dictionnary
# adds a whoppy 280 bytes and bumps actual size to 344 bytes!

# Now a namedtuple (a tuple subclass with property accessors for the various
# tuple items)
>>> Node = namedtuple("Node", "value children")
>>>
>>> o = Node(value=1, children=())
>>> sys.getsizeof(o)
72
>>> sys.getsizeof(o.__dict__)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Node' object has no attribute '__dict__'

# The object doesn't have a __dict__, so 72 bytes is its real total size.


Chris Rebert

unread,
Jul 6, 2009, 5:03:48 PM7/6/09
to mayank gupta, pytho...@python.org, Antoine Pitrou
On Mon, Jul 6, 2009 at 1:30 PM, mayank gupta<moon...@gmail.com> wrote:
> I worked out a small code which initializes about 1,000,000 nodes with some
> attributes, and saw the memory usage on my linux machine (using 'top'
> command). Then just later I averaged out the memory usage per node. I know
> this is not the most accurate way but just for estimated value.

You should try the more accurate sys.getsizeof() function:
http://docs.python.org/library/sys.html#sys.getsizeof

Cheers,
Chris

P.S. Please don't top-post in the future.
--
http://blog.rebertia.com

0 new messages