Memory sizes of python objects?

Clark C . Evans

unread,

Mar 23, 2002, 5:43:22 PM3/23/02

to

Hello. I'm trying to figure out how much memory overhead
is used with using strings, tuples, lists, and maps.
In particular, say I have a nested structure like...

entry = (92939938,"This is one of thousands of map entries",
"""
It even has multi-line data in it, appx 400
characters per entry.
""","2002-31-02 12:20 +500","Dogmeat")
dict = {"Dogmeat":entry,
# about a thousand more "indexed" entries
}
lst = [ entry,
# about a thousand more entries
]

etc. Is there some rule of thumb that I can use to estimate,
for example, take the character data you have and multiply by
a factor of 4 to find the in-memory footprint of tuples, and
each map is 1K plus 64 bytes per entry..

Best,

Clark

Martin v. Loewis

unread,

Mar 24, 2002, 4:10:48 AM3/24/02

to

"Clark C . Evans" <c...@clarkevans.com> writes:

> Hello. I'm trying to figure out how much memory overhead
> is used with using strings, tuples, lists, and maps.
> In particular, say I have a nested structure like...

To find out such things, I recommend to study the source code of
Python, in particular the header files. To see such a computation,
please read

http://groups.google.de/groups?hl=de&selm=j43d3m93iu.fsf%40informatik.hu-berlin.de

> etc. Is there some rule of thumb that I can use to estimate,
> for example, take the character data you have and multiply by
> a factor of 4 to find the in-memory footprint of tuples, and
> each map is 1K plus 64 bytes per entry..

For a rule-of-thumb, you should be aware of the per-object overhead,
and the pointer size on your system. The per-object overhead consists
of three pieces (assuming a 32-bit system):

- the Python per-object overhead: 8 bytes for fixed size objects
(e.g. integers), and 12 bytes for variable-sized objects (strings,
tuples)

- the garbage collector overhead: 8 bytes for containers
(e.g. tuples), nothing for non-containers (strings)

- the malloc overhead: varies widely by platform, but it is 8 bytes in
most cases. In addition, malloc will usually round up the object
size to a multiple of 8.

In addition, you have the per-content overhead, which varies with the
type of object:

- 4 bytes for an integer
- 1 byte per character in a string, plus one for the terminating 0
- 4 bytes per element in a list or tuple (notice that lists allocate
space for more elements in advance)

Dictionaries are more difficult to count; their size also varies with
the Python version. The dictionary object itself is one memory block,
including gc and object overhead (not counting the malloc overhead),
the dictionary takes 144 bytes (in Python 2.2). The array of entries
is another memory block, which takes 12 bytes per entry.

Notice that dictionaries overallocate entries, so you'll always need
space for more entries than you have in the dictionary. Also, "small
dictionaries" (less than 9 entries) don't need any extra space, since
they store their entries in the 144 bytes.

Regards,
Martin

Erno Kuusela

unread,

Mar 24, 2002, 12:00:35 PM3/24/02

to

In article <mailman.1016923416...@python.org>, "Clark C
. Evans" <c...@clarkevans.com> writes:

| Is there some rule of thumb that I can use to estimate,
| for example, take the character data you have and multiply by
| a factor of 4 to find the in-memory footprint of tuples, and
| each map is 1K plus 64 bytes per entry..

make a zillion of them, check memory usage increase with ps (or
equivalent on your platform), and divide by zillion.

-- erno

Peter Hansen

unread,

Mar 24, 2002, 12:02:24 PM3/24/02

to

Keeping in mind Python's propensity for finding already-existing
entities and binding to them instead of creating new objects.

The difference in memory consumption between creating a list
of size 1,000,000 filled with 0's and the same size list filled
with integers from 0 to 999,999 is rather large...

-Peter

Erno Kuusela

unread,

Mar 24, 2002, 1:24:35 PM3/24/02

to

In article <3C9E06A0...@engcorp.com>, Peter Hansen
<pe...@engcorp.com> writes:

you make it sound more mysterious than it is - only numbers under
100 and identifier-like string literals get interned, afaik.

-- erno

Peter Hansen

unread,

Mar 24, 2002, 2:40:17 PM3/24/02

to

Erno Kuusela wrote:
>
> In article <3C9E06A0...@engcorp.com>, Peter Hansen

> | Keeping in mind Python's propensity for finding already-existing
> | entities and binding to them instead of creating new objects.
>
> | The difference in memory consumption between creating a list
> | of size 1,000,000 filled with 0's and the same size list filled
> | with integers from 0 to 999,999 is rather large...
>
> you make it sound more mysterious than it is - only numbers under
> 100 and identifier-like string literals get interned, afaik.

So you mean if I do the following, the first one produces only
a million references to the value 51, while the second one produces
a million instances of 102 plus the million individual references?
I don't get that behaviour. On my machine, with Python 2.2, for
all intents and purposes the memory allocated stays constant.
Did I overlook something?

>>> b = 1
>>> a = [0] * 1000000
>>> for i in xrange(len(a)): a[i] = b*51
...
>>> b = 2
>>> for i in xrange(len(a)): a[i] = b*51
...

-Peter

Erno Kuusela

unread,

Mar 24, 2002, 5:39:48 PM3/24/02

to

In article <3C9E2BA1...@engcorp.com>, Peter Hansen
<pe...@engcorp.com> writes:

| I don't get that behaviour. On my machine, with Python 2.2, for
| all intents and purposes the memory allocated stays constant.
| Did I overlook something?

hard to say. how did you measure it?

with the following program..

import os

a = [0] * 1000000

if 0:
b = 1

for i in xrange(len(a)): a[i] = b*51

else:

b = 2
for i in xrange(len(a)): a[i] = b*51

os.system("ps u %d" % os.getpid())

i get this:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
erno 2664 41.3 3.2 39680 33680 pts/25 S 00:37 0:05 python paa.py

changing the "if 0" to "if 1" gets me

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
erno 2691 44.2 0.9 15712 9712 pts/25 S 00:37 0:05 python paa.py

the difference is about 24 megabytes, so i'd say an integer object
takes up 24 bytes on this platform.

-- erno

Michael Hudson

unread,

Mar 25, 2002, 8:46:25 AM3/25/02

to

Erno Kuusela <erno...@erno.iki.fi> writes:

> | The difference in memory consumption between creating a list
> | of size 1,000,000 filled with 0's and the same size list filled
> | with integers from 0 to 999,999 is rather large...
>
> you make it sound more mysterious than it is - only numbers under
> 100 and identifier-like string literals get interned, afaik.

One character strings too. Also the empty string and the empty tuple.

M.

--
I'm sorry, was my bias showing again? :-)
-- William Tanksley, 13 May 2000

Peter Hansen

unread,

Mar 25, 2002, 7:19:22 PM3/25/02

to

Erno Kuusela wrote:
>
> In article <3C9E2BA1...@engcorp.com>, Peter Hansen
> <pe...@engcorp.com> writes:
>
> | I don't get that behaviour. On my machine, with Python 2.2, for
> | all intents and purposes the memory allocated stays constant.
> | Did I overlook something?
>
> hard to say. how did you measure it?

[w/red face] With Windows.... I'll think twice about checking
something like this on that p3e of s2t operating system next time...
Thanks Erno.

-Peter