Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

map del efficiency python2.2 and 2.1

0 views
Skip to first unread message

John Hunter

unread,
Jul 16, 2002, 11:08:00 AM7/16/02
to

I have a file formatted like

word1 int1
word2 int2

with 800,000+ lines like that. I want to load that file into a map

m[word] = int

and my original implementation was quite slow. So I have been
experimenting with the most efficient way to load it and found that
the best way was to load the entire file as a string, split it, and
then iterate over the sequence with a stride of 2. With this approach
I was able to initialize the entire map in 3.5 - 4s (if anyone has any
further suggestions, please jump in, but I am reasonably satisfied
with this performance).

In the process of profiling, however, I found something surprising.
Most of the total execution time of the code is spent clearing the map
from memory, after all the lines in my script have executed (ie, the
equivalent of calling __del__ on a dictionary type. In python 2.1,
here is a typical execution time profile

time to read file as string and split into seq: 1.000s
time to create map: 2.960s
total execution time: 9.87s

5.9s of the total runtime occurred *after* all the lines of my code
had executed. Since my program does nothing after reading the map, I
assume most of this time is spent freeing the map. If I comment out
the part of my program that creates the map, I do not have that long
delay at the end of the script.

Finally, this time is dramatically longer in python 2.2. Here are
typical runtimes in 2.2

time to read file as string and split into seq: 0.980s
time to create map: 2.650s
total execution time: 27.87s

These runtimes are highly consistent (the machine isn't doing much
else). So 2.2 does about 10% better in filling the map, but then
takes 24s to shut down.

Here's the script:

import time

def pairs(l):
return [(l[i], l[i+1]) for i in range(0, len(l), 2)]

start = time.clock()

file = '/home/jdhunter/dict/meta/allwords.dat'
fh = open(file, 'r')

# this takes 0.97s
s = fh.read()
seq = s.split()
readTime = time.clock()-start
print 'seq has %d items; elapsed time is %1.3f' % \
(len(seq), readTime)

m = {}
#this takes 13.9 s
#for (key, val) in pairs(seq):
# m[key] = int(val)

#this takes 2.9s
for i in xrange(0,len(seq),2):
m[seq[i]] = int(seq[i+1])

mapTime = time.clock() - readTime
print 'map has %d keys; time to load map is %1.3f' % \
(len(m.keys()), mapTime)

#cruncher1:~/dict/test> time python2.1 test_allwords.py
#seq has 1712676 items; elapsed time is 1.000
#map has 856338 keys; time to load map is 2.960
#4.400u 0.480s 0:09.87 49.4% 0+0k 0+0io 298pf+0w

#cruncher1:~/dict/test> time python2.2 test_allwords.py
#seq has 1712676 items; elapsed time is 0.980
#map has 856338 keys; time to load map is 2.650
#13.390u 0.500s 0:27.87 49.8% 0+0k 0+0io 350pf+0w

John Hunter

Alex Martelli

unread,
Jul 16, 2002, 11:53:43 AM7/16/02
to
John Hunter wrote:
...

> def pairs(l):
> return [(l[i], l[i+1]) for i in range(0, len(l), 2)]

In 2.2, with "from __future__ import generators" at the start
of your module, you can probably use this almost as fast as
the other, less-elegant solution, by recoding it as a generator:

def pairs(L):
for i in xrange(0, len(L), 2):
yield L[i], L[i+1]

but that's a minor issue, I guess.

Another likely performance boost in 2.2 would be:

m = dict(pairs(seq))

with pairs coded as here shown.

Regarding del time, you could measure that directly by
doing a "del m" and seeing times before and after; also,
just in case (shouldn't matter, but...), you could try
gc.disable() and see if that makes a difference.


Alex

Jonathan Hogg

unread,
Jul 16, 2002, 11:55:27 AM7/16/02
to
On 16/7/2002 16:08, in article m28z4bn...@mother.paradise.lost, "John
Hunter" <jdhu...@nitace.bsd.uchicago.edu> wrote:

> In the process of profiling, however, I found something surprising.
> Most of the total execution time of the code is spent clearing the map
> from memory, after all the lines in my script have executed (ie, the
> equivalent of calling __del__ on a dictionary type. In python 2.1,
> here is a typical execution time profile
>
> time to read file as string and split into seq: 1.000s
> time to create map: 2.960s
> total execution time: 9.87s
>
> 5.9s of the total runtime occurred *after* all the lines of my code
> had executed. Since my program does nothing after reading the map, I
> assume most of this time is spent freeing the map. If I comment out
> the part of my program that creates the map, I do not have that long
> delay at the end of the script.

I don't think you're reading these statistics correctly. Looking at the
output you include:

> #cruncher1:~/dict/test> time python2.1 test_allwords.py
> #seq has 1712676 items; elapsed time is 1.000
> #map has 856338 keys; time to load map is 2.960
> #4.400u 0.480s 0:09.87 49.4% 0+0k 0+0io 298pf+0w

The numbers along the bottom line (as returned by time) represent: the
user-space seconds spent on the CPU, the kernel-space seconds spent on the
CPU, the actual "wallclock" runtime of the process, and thus the percentage
of the runtime that it was actually on the CPU (as opposed to waiting for
I/O, VM, or pre-empted by some other process).

These numbers seem pretty consistent: 4.4 + 0.48 = 4.88 secs spent on the
CPU. You account for this as 1.000 secs for the load of the sequence and
2.960 secs on the map creation, which leaves only 0.920 secs spent
initialising and tearing down the Python interpreter.

> Finally, this time is dramatically longer in python 2.2. Here are
> typical runtimes in 2.2

I think there is something bogus with your 2.2 install. Even reading the
numbers correctly they still leave 10 secs of CPU time unaccounted for and
you'd definitely notice the Python interpreter wasting that much time
starting and stopping.

Ignoring the numbers, how does the program actually seem to you? With the
runtime in the region of 10 to 27 seconds you should be able to gauge where
the time is going by just counting in your head as it runs. Does it look
like it's spending many seconds finishing up?

I modified your script to explicitly delete 's', 'seq', and 'm' and time
this. On my (twice as slow) machine it took 2/3rds of a CPU second to
deallocate all three.

Jonathan

John Hunter

unread,
Jul 16, 2002, 3:07:02 PM7/16/02
to
>>>>> "Alex" == Alex Martelli <al...@aleax.it> writes:
Alex> Another likely performance boost in 2.2 would be:

Alex> m = dict(pairs(seq))

Yes, very nice. This is the fastest yet. Using the generator pairs
func in the loop

for (key, val) in pairs(seq):
m[key] = int(val)

is indeed a good bit faster than the list comp approach, but slower
than the xrange stride 2 ugly thingie. But dict(pairs(seq)) is
faster and cleaner than either. Thanks.

Alex> Regarding del time, you could measure that directly by doing
Alex> a "del m" and seeing times before and after; also, just in
Alex> case (shouldn't matter, but...), you could try gc.disable()
Alex> and see if that makes a difference.

The reported time required for the 'del m' is miniscule, 0.2s.
Interestingly, though, if I do the 'del m', the extra wait at the end
of the python 2.2 run is dramatically reduced and comparable to that
in 2.1. gc.disable() did not have an appreciable effect, but after I
read your post I did a google groups search on garbage collection and
found that others have talked about an exponential increase in gc
times:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&threadm=mailman.1021733430.3746.python-list%40python.org&rnum=1&prev=/groups%3Fq%3Ddisable%2Bgarbage%2Bcollection%2Bpython%26ie%3DISO-8859-1%26hl%3Den%26btnG%3DGoogle%2BSearch

I'm still experimenting with this and if I figure anything more out,
I'll post.

John Hunter

John Hunter

unread,
Jul 16, 2002, 3:55:40 PM7/16/02
to
>>>>> "Jonathan" == Jonathan Hogg <jona...@onegoodidea.com> writes:

Jonathan> Ignoring the numbers, how does the program actually seem
Jonathan> to you? With the runtime in the region of 10 to 27
Jonathan> seconds you should be able to gauge where the time is
Jonathan> going by just counting in your head as it runs. Does it
Jonathan> look like it's spending many seconds finishing up?

Thanks for the detailed info on the time output. I was referring to
the wall clock time and yes, the script does just sit there and tick
away doing nothing after the last print statement before my prompt
returns.

I cleared up one mystery. The reason my CPU % usage was so low is
that I was running the script on an NFS client and some of the lost
time was due to the executable loading over the network. But I still
can't explain why I am seeing the difference between 2.2 and 2.1 with
the script, which happens whether or not I run it on the NFS server or
client (and when running on the server I get near 100% CPU usage).
There is still an 8s differential that is tacked on at the end of the
2.2 run. As I pointed out to Alex, it clearly is related to how 2.2
is deleting the map, because the 2.1-2.2 difference disappears simply
by adding 'del m' to the end of my script.

Warning! Masochists only need continue reading.

Here are some more results from the wild and wonderful 2.1 vs 2.2. I
have modified the script to do the deletions explicitly and report
times. I'm also running on the NFS server. All these numbers are
repeatable and do not reflect some oddness on an individual run since
I am getting near 100% CPU usage

# With the del m, no difference between 2.2 and 2.1

mother:~/ox2/test> time python2.1 test_allwords.py
Read len 1712676 seq: 1.820s
Loaded map with 856338 keys: 4.740s
Deleted map: 0.280s
6.680u 0.770s 0:07.45 100.0% 0+0k 0+0io 299pf+0w

mother:~/ox2/test> time python2.2 test_allwords.py
Read len 1712676 seq: 1.880s
Loaded map with 856338 keys: 4.510s
Deleted map: 0.250s
6.460u 0.780s 0:07.30 99.1% 0+0k 0+0io 350pf+0w

# Comment out the del m line, big difference between 2.2 and 2.1

mother:~/ox2/test> time python2.1 test_allwords.py
Read len 1712676 seq: 1.850s
Loaded map with 856338 keys: 4.730s
6.690u 0.770s 0:07.46 100.0% 0+0k 0+0io 299pf+0w

mother:~/ox2/test> time python2.2 test_allwords.py
Read len 1712676 seq: 1.870s
Loaded map with 856338 keys: 4.520s
14.480u 0.810s 0:15.28 100.0% 0+0k 0+0io 350pf+0w

# Now let's get really weird. Add a 'del seq' and comment out 'del
# m'. Python 2.1 now craps out!

mother:~/ox2/test> time python2.1 test_allwords.py
Read len 1712676 seq: 1.820s
Loaded map with 856338 keys: 4.780s
Deleted seq: 0.200s
14.610u 0.880s 0:15.57 99.4% 0+0k 0+0io 299pf+0w

mother:~/ox2/test> time python2.2 test_allwords.py
Read len 1712676 seq: 1.880s
Loaded map with 856338 keys: 4.500s
Deleted seq: 0.200s
14.550u 0.770s 0:17.73 86.4% 0+0k 0+0io 350pf+0w

What is bizarre about this is that the addition of the del seq line to
the script causes this 8s lag between the last print and the end of
the program in 2.1

Here then is the synopsis (excluded runs where %CPU < 99%)
2.1 2.1 2.1 2.1
del m yes yes no no
del seq yes no yes no
run time 7.9s 7.6s 15.4s 7.4s

2.2 2.2 2.2 2.2
del m yes yes no no
del seq yes no yes no
run time 7.6s 7.24s 15.6s 15.4s

OK, I'll stop torturing you and me now.

John Hunter

import time

class Timer:
"""Record events and how long it takes. The time between event is
recorded"""

def __init__(self):
self.last = time.clock()
self.events = []
def event(self, s):
now = time.clock()
self.events.append( (s, now - self.last) )
self.last = now

def __repr__(self):
s = ''
for event in self.events:
s += '%s: %1.3fs\n' % event
return s

timer = Timer()

file = '/home/jdhunter/ox2/meta/allwords.dat'


fh = open(file, 'r')

s = fh.read()
seq = s.split()
timer.event('Read len %d seq' % len(seq))

m = {}


for i in xrange(0,len(seq),2):
m[seq[i]] = int(seq[i+1])

timer.event('Loaded map with %d keys' % len(m.keys()))

# these are the del seq and del m lines that I commented or
#uncommented for the table above

del m
timer.event('Deleted map')

del seq
timer.event('Deleted seq')

print timer,

Tim Peters

unread,
Jul 16, 2002, 5:12:56 PM7/16/02
to
[John Hunter]

> I have a file formatted like
>
> word1 int1
> word2 int2
>
> with 800,000+ lines like that. I want to load that file into a map
>
> m[word] = int
>
> and my original implementation was quite slow.

...

> In the process of profiling, however, I found something surprising.
> Most of the total execution time of the code is spent clearing the map
> from memory, after all the lines in my script have executed (ie, the
> equivalent of calling __del__ on a dictionary type.

...

Before trying anything else, rebuild with pymalloc enabled. We've seen many
cases where the platform free() ends up in seemingly quadratic-time behavior
when deleting gazillions of objects, presumably while trying to coalesce
them into larger free blocks. pymalloc usually fixes that. Would also be
interesting to try this under current CVS Python (where pymalloc is enabled
by default, and handles strings too).

Bengt Richter

unread,
Jul 16, 2002, 11:12:21 PM7/16/02
to
On Tue, 16 Jul 2002 15:53:43 GMT, Alex Martelli <al...@aleax.it> wrote:

>John Hunter wrote:
> ...
>> def pairs(l):
>> return [(l[i], l[i+1]) for i in range(0, len(l), 2)]

^^^^^


>
>In 2.2, with "from __future__ import generators" at the start
>of your module, you can probably use this almost as fast as
>the other, less-elegant solution, by recoding it as a generator:
>
>def pairs(L):
> for i in xrange(0, len(L), 2):

^^^^^^
^


> yield L[i], L[i+1]
>
>but that's a minor issue, I guess.
>
>Another likely performance boost in 2.2 would be:
>
>m = dict(pairs(seq))
>
>with pairs coded as here shown.

In case John didn't notice, that little 'x' your finger likely
typed automatically probably made some difference too ;-)

>
>Regarding del time, you could measure that directly by
>doing a "del m" and seeing times before and after; also,
>just in case (shouldn't matter, but...), you could try
>gc.disable() and see if that makes a difference.
>

The range(0, len(l), 2) would have been part of the garbage
to collect too, unless gc had already hit (in which case
gc.disable() might change timing in two places).

BTW, I wonder what the speed difference is in collecting
deleted numbers in the magic shared range(-1,100) vs others is,
given that it would just be decrementing refs until the end.

Regards,
Bengt Richter

Andrew McNamara

unread,
Jul 17, 2002, 12:11:33 AM7/17/02
to
>This is exactly the kind of "seemingly random disaster" we see when the
>platform free() goes nuts, although it's usually much more pronounced than a
>measly factor of 2. If so, rebuilding with pymalloc is likely to fix it,
>while if you don't you'll never answer "why?" without a deep understanding
>of your platform C's implementation of free().

I recently got curious about the relative performance of various python
versions, new style classes and slots. I wrote a very simple benchmark
that allocated and deallocated a million small objects on a 600MHz P3
running linux 2.4.18 and glibc-2.2.4 with enough memory to prevent paging.

Results and code are below. It's clear that the system malloc hurts
badly for deallocation, but even with pymalloc deallocation is still
an expensive exercise (the attribute dictionary, presumably). Slots are
very efficient for small objects, in terms of CPU and memory footprint.

Create and
set attr Set attr Del+GC RSS
-------- -------- -------- -------
py22 slots 20.3s 4.6s 19.5s 48M
py221 slots 21.0s 5.0s 21.5s 48M
pymalloc slots 16.2s 4.7s 0.7s 40M

py22 obj 65.9s 7.4s 401.5s 185M
py221 obj 62.1s 7.3s 492.8s 185M
pymalloc obj 57.9s 7.2s 125.8s 185M

py22 trad 55.8s 6.8s 275.8s 192M
py221 trad 56.8s 7.1s 284.7s 192M
pymalloc trad 52.6s 6.8s 115.0s 185M

py211 trad 52.6s 6.6s 239.4s 154M

With gc.disable(), and explicitly calling gc.collect():

Create and
set attr Set attr Del+GC RSS
-------- -------- -------- -------
py22 slots 13.6s 4.6s 19.7s 48M
py221 slots 14.0s 5.0s 19.4s 48M
pymalloc slots 10.6s 4.7s 0.7s 40M

py22 obj 18.0s 7.3s 400.5s 185M
py221 obj 18.7s 7.6s 399.0s 185M
pymalloc obj 16.2s 7.2s 117.0s 185M

py22 trad 11.2s 6.8s 270.8s 192M
py221 trad 12.1s 7.2s 240.7s 192M
pymalloc trad 11.4s 6.8s 115.3s 185M

py211 trad 11.6s 6.5s 229.0s 154M

py152 trad 7.8s 3.1s 75.5s 123M


KEY
------ -----------------------------------------------------------
slots with new-style objects and using slots
obj with new-style objects
trad with traditional objects
py22 Python 2.2
py221 Python 2.2.1
pymalloc Python 2.2.1 compiled with --with-pymalloc
py211 Python 2.1.1 with traditional objects
py152 Python 1.5.2 with traditional objects (no cyclic-GC)

----------------------------------------------------------------------------

import time, gc

iters = 1000000

def start():
global ts

ts = time.time()

def stop(msg):
delta = time.time() - ts

if delta < 0.00001:
print msg, "took %.1fus" % (delta * 1000000)
elif delta < 0.01:
print msg, "took %.1fms" % (delta * 1000)
else:
print msg, "took %.1fs" % (delta)

try:
class slots(object):
__slots__ = ('a')

def __init__(self, n):
self.a = n

def set(self, x):
self.a = x

class obj(object):
def __init__(self, n):
self.a = n

def set(self, x):
self.a = x

except NameError:
pass

class trad:
def __init__(self, n):
self.a = n

def set(self, x):
self.a = x


def testit(ctor, count):
print "testing", ctor.__name__

start()
a = map(ctor, xrange(count))
gc.collect()
stop(' create + setattr')

start()
aa = ctor(0)
map(aa.set, xrange(count))
gc.collect()
stop(' setattr')

start()
del a
gc.collect()
stop(' del')

if 0:
gc.disable()

if 0:
testit(slots, iters)

if 0:
testit(obj, iters)

if 1:
testit(trad, iters)


--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


Tim Peters

unread,
Jul 16, 2002, 11:54:43 PM7/16/02
to
[John Hunter]
> ...

> Here then is the synopsis (excluded runs where %CPU < 99%)
> 2.1 2.1 2.1 2.1
> del m yes yes no no
> del seq yes no yes no
> run time 7.9s 7.6s 15.4s 7.4s
>
> 2.2 2.2 2.2 2.2
> del m yes yes no no
> del seq yes no yes no
> run time 7.6s 7.24s 15.6s 15.4s

This is exactly the kind of "seemingly random disaster" we see when the


platform free() goes nuts, although it's usually much more pronounced than a
measly factor of 2. If so, rebuilding with pymalloc is likely to fix it,
while if you don't you'll never answer "why?" without a deep understanding
of your platform C's implementation of free().

> OK, I'll stop torturing you and me now.

likewise<wink>-ly y'rs - tim

Tim Peters

unread,
Jul 17, 2002, 1:16:30 AM7/17/02
to
[Andrew McNamara]
> ...

> I wrote a very simple benchmark that allocated and deallocated a million
> small objects on a 600MHz P3 running linux 2.4.18 and glibc-2.2.4 with
> enough memory to prevent paging.
>
> Results and code are below. It's clear that the system malloc hurts
> badly for deallocation, but even with pymalloc deallocation is still
> an expensive exercise (the attribute dictionary, presumably). Slots are
> very efficient for small objects, in terms of CPU and memory footprint.

pymalloc in (all flavors of) 2.2 was still an experimental feature (that's
why it's disabled by default), and wasn't tuned for the way new-style
classes work. In particular, pymalloc's "small object threshold" is much
too low in the 2.2 line, so low that *no* instance dict can get handled by
it, and consequently it still passes on lots of work to the system
malloc/free. The strong benefit you got from it is more indirect than you
may think; your platform free() appears to be truly dreadful when dealing
with lots of small objects.

Under current CVS Python (what will eventually be Python 2.3), pymalloc is
tuned properly for new-style classes, and the deletion times are minor on my
home Win98SE 866MHz box (the del times below are 100x smaller than you were
seeing with 221+pymalloc); they should become minor on your box too, as
pymalloc's code is the same on all platforms, while its few hardcoded
assumptions about OS paging are geared more toward Unix than Windows:

C:\Code\python\PCbuild>python -O temp.py
testing slots
create + setattr took 13.1s
setattr took 1.9s
del took 0.6s
testing obj
create + setattr took 36.6s
setattr took 2.1s
del took 1.1s
testing trad
create + setattr took 30.8s
setattr took 2.4s
del took 1.0s

This was leaving gc enabled, but changing your testit() to

def testit(ctor, count):
print "testing", ctor.__name__

start()
a = map(ctor, xrange(count))

stop(' create + setattr')

gc.collect()

start()
aa = ctor(0)
map(aa.set, xrange(count))

stop(' setattr')
del aa
gc.collect()

start()
del a
gc.collect()
stop(' del')

because it just confuses things to time explicit gc.collect() calls too
(except when you're specifically trying to time collection, as you may have
intended to do in your " del" test). gc is a bit zippier in current CVS
too.

Andrew McNamara

unread,
Jul 17, 2002, 1:54:57 AM7/17/02
to
>Under current CVS Python (what will eventually be Python 2.3), pymalloc is
>tuned properly for new-style classes, and the deletion times are minor on my
>home Win98SE 866MHz box (the del times below are 100x smaller than you were
>seeing with 221+pymalloc); they should become minor on your box too, as
>pymalloc's code is the same on all platforms, while its few hardcoded
>assumptions about OS paging are geared more toward Unix than Windows:

That's what I wanted to hear! I had just finished reading your CVS
check-in comments and was building and testing the CVS python. Thanks
for the fine work.

>because it just confuses things to time explicit gc.collect() calls too
>(except when you're specifically trying to time collection, as you may have
>intended to do in your " del" test).

I thought it only fair that explicit GC be included in the earlier tests
as well - as it is when autoGC is enabled.

BTW, I always ran the tests one at a time (with only one "if" selected)
- running more than one test at a time resulted in wildly fluctuating
times - presumably due to fragmentation of the heap.

>gc is a bit zippier in current CVS too.

More good news. Thanks. 8-)

Here's the table again with CVS Python included, for those that care:

Create and
set attr Set attr Del+GC RSS
-------- -------- -------- -------
py22 slots 20.3s 4.6s 19.5s 48M
py221 slots 21.0s 5.0s 21.5s 48M
pymalloc slots 16.2s 4.7s 0.7s 40M

pyCVS slots 17.8s 5.0s 0.8s 40M

py22 obj 65.9s 7.4s 401.5s 185M
py221 obj 62.1s 7.3s 492.8s 185M
pymalloc obj 57.9s 7.2s 125.8s 185M

pyCVS obj 50.6s 6.8s 1.5s 185M

py22 trad 55.8s 6.8s 275.8s 192M
py221 trad 56.8s 7.1s 284.7s 192M
pymalloc trad 52.6s 6.8s 115.0s 185M

pyCVS trad 44.2s 6.2s 1.3s 185M

py211 trad 52.6s 6.6s 239.4s 154M

With gc.disable(), and explicitly calling gc.collect():

Create and
set attr Set attr Del+GC RSS
-------- -------- -------- -------
py22 slots 13.6s 4.6s 19.7s 48M
py221 slots 14.0s 5.0s 19.4s 48M
pymalloc slots 10.6s 4.7s 0.7s 40M

pyCVS slots 17.8s 5.0s 0.8s 44M

py22 obj 18.0s 7.3s 400.5s 185M
py221 obj 18.7s 7.6s 399.0s 185M
pymalloc obj 16.2s 7.2s 117.0s 185M

pyCVS obj 13.8s 6.8s 1.5s 185M

py22 trad 11.2s 6.8s 270.8s 192M
py221 trad 12.1s 7.2s 240.7s 192M
pymalloc trad 11.4s 6.8s 115.3s 185M

pyCVS trad 9.2s 6.3s 1.3s 185M

py211 trad 11.6s 6.5s 229.0s 154M

py152 trad 7.8s 3.1s 75.5s 123M


KEY
------ -----------------------------------------------------------
slots with new-style objects and using slots
obj with new-style objects
trad with traditional objects

pyCVS Python CVS (aka 2.3, pymalloc on by default)


pymalloc Python 2.2.1 compiled with --with-pymalloc

py221 Python 2.2.1
py22 Python 2.2


py211 Python 2.1.1 with traditional objects
py152 Python 1.5.2 with traditional objects (no cyclic-GC)

Alex Martelli

unread,
Jul 17, 2002, 3:32:16 AM7/17/02
to
Bengt Richter wrote:

> On Tue, 16 Jul 2002 15:53:43 GMT, Alex Martelli <al...@aleax.it> wrote:
>
>>John Hunter wrote:
>> ...
>>> def pairs(l):
>>> return [(l[i], l[i+1]) for i in range(0, len(l), 2)]
> ^^^^^

...


>>def pairs(L):
>> for i in xrange(0, len(L), 2):
> ^^^^^^
> ^
>> yield L[i], L[i+1]
>>
>>but that's a minor issue, I guess.
>>
>>Another likely performance boost in 2.2 would be:
>>
>>m = dict(pairs(seq))
>>
>>with pairs coded as here shown.
>
> In case John didn't notice, that little 'x' your finger likely
> typed automatically probably made some difference too ;-)

Yep, but John had used xrange in other spots of his code,
so I didn't think I needed to explain better. Still, that
may have been an error on my part. So, the better explanation:
range and list comprehensions build lists and thus may have
to allocate lots of memory. xrange and generators don't build
any list but rather return one item at a time as needed.

But the importance of using xrange vs range is minor -- yes
it does make SOME difference, but it's small. See:

from __future__ import generators
import time, gc

L = range(300 * 1000)

def timit(f):
start = time.clock()
m = dict(f(L))
stend = time.clock()
print f.__name__, stend-start

def pairs_lc_ra(L):
return [ (L[i],L[i+1]) for i in range(0, len(L), 2) ]

def pairs_lc_xr(L):
return [ (L[i],L[i+1]) for i in xrange(0, len(L), 2) ]

def pairs_ge_ra(L):
for i in range(0, len(L), 2):


yield L[i], L[i+1]

def pairs_ge_xr(L):
for i in range(0, len(L), 2):


yield L[i], L[i+1]

gc.disable()

for i in range(3):
for f in pairs_lc_ra, pairs_lc_xr, pairs_ge_ra, pairs_ge_xr:
timit(f)


[alex@lancelot MyDocuments]$ python pa.py
pairs_lc_ra 0.53
pairs_lc_xr 0.52
pairs_ge_ra 0.29
pairs_ge_xr 0.28
pairs_lc_ra 0.52
pairs_lc_xr 0.52
pairs_ge_ra 0.28
pairs_ge_xr 0.28
pairs_lc_ra 0.52
pairs_lc_xr 0.52
pairs_ge_ra 0.28
pairs_ge_xr 0.28

This is with 2.2.1. 2.3 built from CVS lowers generator's advantage:

[alex@lancelot src]$ ./python pa.py
pairs_lc_ra 0.44
pairs_lc_xr 0.4
pairs_ge_ra 0.29
pairs_ge_xr 0.28
pairs_lc_ra 0.43
pairs_lc_xr 0.4
pairs_ge_ra 0.29
pairs_ge_xr 0.29
pairs_lc_ra 0.43
pairs_lc_xr 0.39
pairs_ge_ra 0.29
pairs_ge_xr 0.29

but even here, using generators rather than list comprehensions
seems still much more important than using xrange rather than range.


Alex


Bengt Richter

unread,
Jul 17, 2002, 5:04:55 AM7/17/02
to

^-- needs x, but doesn't make much difference

Yes. Looks like about ~5% advantage for 'x' and ~100% for 'ge' on my
old machine (P2-300mhz/320MB with
Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32):

pairs_lc_ra 2.12676607592
pairs_lc_xr 2.02671428164
pairs_ge_ra 1.00829043683
pairs_ge_xr 0.942929913458
pairs_lc_ra 2.11218992576
pairs_lc_xr 2.07914132127
pairs_ge_ra 1.00314620904
pairs_ge_xr 0.959043968146
pairs_lc_ra 2.0945639475
pairs_lc_xr 2.22773139387
pairs_ge_ra 1.03221134747
pairs_ge_xr 0.952561302467

Thanks for the demo.

Regards,
Bengt Richter

John Hunter

unread,
Jul 17, 2002, 9:56:27 AM7/17/02
to
>>>>> "Tim" == Tim Peters <tim...@comcast.net> writes:

Tim> This is exactly the kind of "seemingly random disaster" we
Tim> see when the platform free() goes nuts, although it's usually
Tim> much more pronounced than a measly factor of 2. If so,
Tim> rebuilding with pymalloc is likely to fix it, while if you
Tim> don't you'll never answer "why?" without a deep understanding
Tim> of your platform C's implementation of free().

pymalloc to the rescue. I rebuilt 2.2 --with-pymalloc and the
interminable delay was vanquished.

Now where do I go to get a deep understanding of my platform's (glibc
2.2.2) implementation of free and why it behaves so badly?

Thanks for the help
John Hunter

Michael Hudson

unread,
Jul 17, 2002, 12:28:10 PM7/17/02
to
John Hunter <jdhu...@nitace.bsd.uchicago.edu> writes:

> Now where do I go to get a deep understanding of my platform's (glibc
> 2.2.2) implementation of free and why it behaves so badly?

I believe 2.3 will be better here. Not a lot of help, no...

Cheers,
M.

--
The Programmer's Quick Guide To Python (Time Machine version):
You try to shoot yourself in the foot, only to realize that
there's no need, since Guido thoughtfully shot you in the foot
years ago. -- Nick Mathewson, comp.lang.python

Tim Peters

unread,
Jul 17, 2002, 3:45:46 PM7/17/02
to
[John Hunter]

> pymalloc to the rescue. I rebuilt 2.2 --with-pymalloc and the
> interminable delay was vanquished.

Cool! It will be even better in 2.3 (see other posts in this thread).

> Now where do I go to get a deep understanding of my platform's (glibc
> 2.2.2) implementation of free and why it behaves so badly?

Study the source code, watch it in a debugger until your eyes bleed, and/or
try to engage its current maintainer(s) in discussion. Python grew its own
pymalloc because platform mallocs are a diverse, finicky, and ever-moving
target -- it would truly be a full-time job to keep up-to-date on the subtle
quirks and disaster-modes of all of them, but now we only have to understand
ours. And that's only a half-time job <wink>.

Michael Hudson

unread,
Jul 18, 2002, 5:45:03 AM7/18/02
to
Michael Hudson <m...@python.net> writes:

> John Hunter <jdhu...@nitace.bsd.uchicago.edu> writes:
>
> > Now where do I go to get a deep understanding of my platform's (glibc
> > 2.2.2) implementation of free and why it behaves so badly?
>
> I believe 2.3 will be better here. Not a lot of help, no...

By this I meant glibc 2.3! Python 2.3 will also be better, but for a
different reason.

Cheers,
M.

--
I'm a keen cyclist and I stop at red lights. Those who don't need
hitting with a great big slapping machine.
-- Colin Davidson, cam.misc

Daniel Yoo

unread,
May 14, 2004, 6:51:22 PM5/14/04
to
Tim Peters <tim...@comcast.net> wrote:
: [John Hunter]
0 new messages