List size hints

James Preston

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Greetings all,

import possibleSillyQuestion

Is there any way to give Python RTE hints as the expected size of
an array or list, or to get the reallocation to occur in larger chunks?

C:\TEMP>c:\"program files"\python\python.exe
Python 1.5.1 (#0, Apr 13 1998, 20:22:04) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import array
>>> a = array.array('h')
>>> a.append(2)
>>> a.buffer_info()
(7936480, 1)
>>> a.append(3)
>>> a.buffer_info()
(7937504, 2)
>>>

This problem occurred to me while trying to speed up a generating
routine
that finds primes < 20k. Almost all of the run time was the append(). I
found a 3X speed increase by writing to a _file_ instead of a
list/array.
I found that unsettling. ;-)

Pointer appreciated. Thanks.

James Preston

Janko Hauser

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

If you insist in using array and don't want to use NumPy, there is a
second parameter for array.array

>>> a=array.array('h',[10]*10)
>>> a
array('h', [10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> a[9]=9
>>> a
array('h', [10, 10, 10, 10, 10, 10, 10, 10, 10, 9])

In NumPy you can do:
>>> from Numeric import *
>>> a=zeros(100,'i')

__Janko

Janko Hauser

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Uh, sorry there was a mistake in my last post, it's early in the
morning.

>>> from Numeric import *
>>> a=zeros((100,),'i')

__Janko

Tim Peters

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

[James Preston]
> ...

> Is there any way to give Python RTE hints as the expected size of
> an array or list, or to get the reallocation to occur in larger chunks?

The only "hint" you can give is to allocate the size you want explicitly,
then keep track of the current index yourself; e.g.,

mylistsize = 5
mylist = [None] * mylistsize
i = 0

then manipulate i instead of appending. Rarely helpful, though.

> C:\TEMP>c:\"program files"\python\python.exe
> Python 1.5.1 (#0, Apr 13 1998, 20:22:04) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import array
> >>> a = array.array('h')
> >>> a.append(2)
> >>> a.buffer_info()
> (7936480, 1)
> >>> a.append(3)
> >>> a.buffer_info()
> (7937504, 2)
> >>>

Actually, that says less than you may think <wink>:

Python 1.5.2b1 (#0, Dec 10 1998, 11:29:56) [MSC 32 bit (Intel)] on win32

>>> import array
>>> a = array.array('h')

>>> adr = {}
>>> for i in range(100000):
... a.append(12)
... adr[a.buffer_info()[0]] = 1
...
>>> len(adr)
134
>>>

That is, while appending 100,000 elements one at a time, there were only 134
distinct start-of-buffer addresses. In general, the bigger an array (or
list) gets, the less frequently reallocations occur. Can hurt to grow more
than one list at a time frequently, though.

> This problem occurred to me while trying to speed up a generating
> routine that finds primes < 20k. Almost all of the run time was
> the append(). I found a 3X speed increase by writing to a _file_
> instead of a list/array. I found that unsettling. ;-)

So do I <frown>. You should post the code -- something's fishy. Also
bothered by your use of "list/array" -- they're not the same thing, and
arrays are generally slower than lists (in return for which they consume
less storage). Finally, if it's taking more than a couple seconds to find
all the primes under 20K, what you really need is a better algorithm <0.9
wink>. I.e., that it's slow enough that you're bothering to time it at all
is fishy too.

I'll attach a simple sieve with a timer; takes about 1/3 second to generate
the primes < 20,000 on a P5-166 under Win95 and Python 1.5.2b1. If I
replace the list append with a file write, it's slower -- which is what I
expected.

could-be-you're-just-haunted<wink>-ly y'rs - tim

def sieve(n):
if n < 2:
return []
candidate = range(n+1)
primes = [2]
p = 1
while 1:
p = p + 2
while p <= n and candidate[p] == 0:
p = p + 2
if p > n:
break
primes.append(p)
for pmult in xrange(p*p, n+1, 2*p):
candidate[pmult] = 0
return primes

def timeit(f):
from time import clock
for i in range(3):
start = clock()
x = f(20000)
finish = clock()
print len(x), finish - start

timeit(sieve)

Guido van Rossum

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

> [James Preston]
> > ...
> > Is there any way to give Python RTE hints as the expected size of
> > an array or list, or to get the reallocation to occur in larger chunks?

[Tim Peters]

> The only "hint" you can give is to allocate the size you want explicitly,
> then keep track of the current index yourself; e.g.,

[...]

[Tim goes on to show that arrays get relocated infrequently, and muses
that this may change when you extend multiple arrays simultaneously.]

Note that *lists* (but, because their implementation is separate and
much less under scrutiny, *not* arrays) actually make sure that in the
long run they don't get resized more often than every 100 appends.
It uses the following routine to decide how much memory to allocate
for a list of n elements:

static int
roundupsize(n)
int n;
{
if (n < 500)
return ROUNDUP(n, 10);
else
return ROUNDUP(n, 100);
}

(A better algorithm could be substituted easily, but this seems a good
compromise given that both small lists and large lists need to be
catered for.)

For very large lists, I expect that the effect observed by Tim Peters
for arrays will also help -- the realloc() routine will eventually
move a growing memory block to a place in memory where it has plenty
of space to grow. (Using the buffer_info() method it's easy to
experiment with this under various circumstances.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Christian Tismer

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Tim Peters wrote:
[many things, also a sieve]

funny, I didn't realize that you wrote
a sieve already. Now I don't want to drop mine.

Here it is, just a few ticks faster (5-10%), and it doesn't
generate an integer overflow.

def tissieve(n):
candidates = range(3, n+1, 2)
for p in candidates:
if not p: continue
for q in xrange(3*p, n+1, 2*p):
candidates[(q-3)/2] = 0
return [2] + filter(None, candidates)

If found it cheaper to build the xrange from 3*p instead
of p*p, but not having to check for overflow.

Anyway, I'm off topic since list appends are not touched :-)

tissieve(10**6) gives 78498 primes in 8.8 secs,
timsieve is nearly there after 9.4 secs.
(Not intending to race on primes :c)

ciao - chris

--
Christian Tismer :^) <mailto:tis...@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net
10553 Berlin : PGP key -> http://pgp.ai.mit.edu/
we're tired of banana software - shipped green, ripens at home

Gordon McMillan

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Christian wrote:

> Tim Peters wrote:
> [many things, also a sieve]

...

> Here it is, just a few ticks faster (5-10%), and it doesn't
> generate an integer overflow.

...

> tissieve(10**6) gives 78498 primes in 8.8 secs,
> timsieve is nearly there after 9.4 secs.
> (Not intending to race on primes :c)

The gauntlet has been thrown!

I've got Tim at 5:4, (but only because Christian has a family to
support).

Any takers?

- Gordon

Christian Tismer

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Well, thank you for saving my family, but too late, I already
tried and got Tim by nearly 3:2 (tissieve2).
BTW I changed Tim's code back to its original, but added
a test for overflow.

I claim tissieve2 is hard to beat :c)

tissieve1 78498 8.682
tissieve2 78498 4.354
timsieve 78498 6.62

happy new year - chris

-------- sieve.py ---------
def tissieve1(n):

candidates = range(3, n+1, 2)
for p in candidates:
if not p: continue
for q in xrange(3*p, n+1, 2*p):
candidates[(q-3)/2] = 0
return [2] + filter(None, candidates)

def tissieve2(n):

candidates = range(3, n+1, 2)

sq = int(n**0.5)
for p in candidates:
if p and p<sq:
for q in xrange(((p*p)-3)/2, ( (n+1)-3)/2, p):\
candidates[q] = 0

return [2] + filter(None, candidates)

def timsieve(n):

if n < 2:
return []
candidate = range(n+1)
primes = [2]
p = 1

sq = int(n**0.5)

while 1:
p = p + 2
while p <= n and candidate[p] == 0:
p = p + 2
if p > n:
break
primes.append(p)

if p < sq:

for pmult in xrange(p*p, n+1, 2*p):
candidate[pmult] = 0
return primes

def timeit(f):
from time import clock

for i in range(1):
start = clock()
x = f(1000000)
finish = clock()
print f.__name__, len(x), round(finish - start, 3)

timeit(tissieve1)
timeit(tissieve2)
timeit(timsieve)
---------------------------

Gordon McMillan

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Christian falls for it:
...

> Well, thank you for saving my family, but too late, I already
> tried and got Tim by nearly 3:2 (tissieve2).
> BTW I changed Tim's code back to its original, but added
> a test for overflow.

Now Christian, surely you know that Tim, who already has
difficulty avoiding optimiziing to absurdity, will pursue this with
all the fervor that Geraldo would devote to finding all the *real*
blondes in Salt Lake City.

> I claim tissieve2 is hard to beat :c)

...

> def tissieve2(n):
> candidates = range(3, n+1, 2)
> sq = int(n**0.5)
> for p in candidates:
> if p and p<sq:
> for q in xrange(((p*p)-3)/2, ( (n+1)-3)/2, p):\
> candidates[q] = 0
> return [2] + filter(None, candidates)

You make it too easy!

candidates = range(5,...)
....
return [2,3] + ...

Happy New Year to you, too!

tisprimes =

- Gordon

Christian Tismer

unread,

Jan 4, 1999, 3:00:00 AM1/4/99

to

Gordon McMillan wrote:
>
> Christian falls for it:
> ...
> > Well, thank you for saving my family, but too late, I already
> > tried and got Tim by nearly 3:2 (tissieve2).
> > BTW I changed Tim's code back to its original, but added
> > a test for overflow.
>
> Now Christian, surely you know that Tim, who already has
> difficulty avoiding optimiziing to absurdity, will pursue this with
> all the fervor that Geraldo would devote to finding all the *real*
> blondes in Salt Lake City.

Well, I fear you are right.
Where I have to say that the blondes problem is
much more challenging, and I hope Tim will
solve it instead!

> You make it too easy!
>
> candidates = range(5,...)
> ....
> return [2,3] + ...

Hmm. I tried it already, but iteration gets much too
complicated if you filter off multiples of 3 and more.
It works with a compiled language (had a similar race
in 1979 and won, in Algol68:), where the cutoff seems
to be at 3 or 5.

I_guess_the_number_of_blondes_is_a_prime_ly y'rs - chris

Tim Peters

unread,

Jan 5, 1999, 3:00:00 AM1/5/99

to

[Gordon McMillan, playing bookmaker for a prime-sieve battle]

> The gauntlet has been thrown!
>
> I've got Tim at 5:4, (but only because Christian has a family to
> support).
>
> Any takers?

Well, as I wrote, "I'll attach a simple sieve" -- speed was not its goal. I
actually obtained it by *un*optimizing the sieve I use (much like Chris &
Mike's later nightmares; see attached) so that it used list.append again,
and didn't require poor James to reverse-engineer an irrelevant (for his
purposes) index<->number mapping. OTOH, no sieve without *some* "square it"
trickery is worth posting no matter what the purpose <snort>.

Notes:

+ The speed of the attached is indistinguishable (on my platform) from
Mike's mcfsieve3.

+ OTOH, unlike mcfsieve3 or tahsieve1 or tissieve2 it gets the right answer
when passed 25 <wink>.

+ Christian and I implicitly rely on sqrt returning the exact answer when
it's representable, while Mike and TimH rely on n ** 0.5 doing that.
IEEE-754 guarantees the former but not the latter. We're all gambling,
though, for portable code.

+ The attached doesn't need to "insert 2" at the end. But that goes so fast
it's hardly worth avoiding.

+ Agree with Chris that you can't nuke the multiples of 3 too without
slowing it down, simply due to the boost in the # of trips around Python's
eval loop; agree too that it can pay in a compiled language.

+ I don't know a faster method, but have heard such exist <challenging
wink>. In Real Life, it can pay to "bunch up" a number of primes before
doing the cross-out business, then cross out multiple multiples in adjacent
blocks whose size is determined by platform cache characteristics. But,
again, the code to manage that incurs too much overhead in Python -- besides
which, it's insane.

professional-courtesy-to-leave-*something*-for-fortran-ly y'rs - tim

import math

def sieve(n):
if n < 2:
return []

limit = int(math.sqrt(n))
primes = range(1, n+1, 2)
primes[0] = 0
n = len(primes)
for p in primes:
if not p: continue
if p <= limit:
# note that p*p is odd, so no need to subtract 1
# before shifting right -- same thing in the end
for i in xrange((p*p)>>1, n, p):
primes[i] = 0
continue
break
primes[0] = 2
return filter(None, primes)

Christian Tismer

unread,

Jan 5, 1999, 3:00:00 AM1/5/99

to

Tim Peters wrote:
>
> [Gordon McMillan, playing bookmaker for a prime-sieve battle]
> > The gauntlet has been thrown!
> >
> > I've got Tim at 5:4, (but only because Christian has a family to
> > support).
> >
> > Any takers?
>
> Well, as I wrote, "I'll attach a simple sieve" -- speed was not its goal. I
> actually obtained it by *un*optimizing the sieve I use (much like Chris &
> Mike's later nightmares; see attached) so that it used list.append again,
> and didn't require poor James to reverse-engineer an irrelevant (for his
> purposes) index<->number mapping. OTOH, no sieve without *some* "square it"
> trickery is worth posting no matter what the purpose <snort>.

Sorry, I didn't want to make up a race, and I just
had not read your post to the end. Instead, I tried
to make a quick sievelet to help James.

Unfortunately, primes are attractive to insanes like
me, and after Gordon's post it was too late to run.

> Notes:
>
> + The speed of the attached is indistinguishable (on my platform) from
> Mike's mcfsieve3.

Well, same result. The code appears optimal under the
constraint to factor out only even primes.

> + OTOH, unlike mcfsieve3 or tahsieve1 or tissieve2 it gets the right answer
> when passed 25 <wink>.

Thanks, tissieve5 and 6 are cleaned up accordingly.

> + Christian and I implicitly rely on sqrt returning the exact answer when
> it's representable, while Mike and TimH rely on n ** 0.5 doing that.
> IEEE-754 guarantees the former but not the latter. We're all gambling,
> though, for portable code.

I made the same mistake, just to save to import math. Guilty!

> + The attached doesn't need to "insert 2" at the end. But that goes so fast
> it's hardly worth avoiding.

Very elegant. I adopted this in sieve 6.

> + Agree with Chris that you can't nuke the multiples of 3 too without
> slowing it down, simply due to the boost in the # of trips around Python's
> eval loop; agree too that it can pay in a compiled language.

This is not completely true. It took some gray hairs,
but tissieve6 factors out 2 and 3, and still has quite
easy indexing. Computing an array index from a
candidate number can be done by division by 3.
In general, this technique has in fact
its limitations. I don't believe that a simple
formula for index computation is available, when
many prime factors come into play.
That would imply to use a lookup table for indexing,
and then it becomes harder to make the inner loop fast.

But let's see:

>>> def loop1(n):
... x=[0]
... for i in xrange(n):\
... x[0]=i
...
>>> def loop2(n):
... x=[0]
... for i in xrange(n):\
... x[0]=i; p=x[0]
...
>>> def loop3(n):
... x=[1, 1]
... for i in xrange(n):\
... x[x[0]]=i
...
>>> from ptools import timing
>>> for f in (loop1, loop2, loop3): print f.__name__, timing(f, 1000000)
loop1 (2.53, None)
loop2 (3.4, None)
loop3 (2.86, None)
>>>

The cost of an extra indirection seems to be around
a factor of 1.4 if a single indirection is enough.
That means, to compensate for this (break-even), the
algorithm must reduce the number of candidates by 1.4.

Here an estimate about what can be saved by factorizing:

>>> p=tissieve5(100)
>>> for k in range(1, 8):
... plis = p[:k]
... base, rem = remainders(plis)
... saving = float(base)/len(rem)
... print "%2d %6d %5d %0.3f" %(k, base, len(rem), saving)
...
1 2 1 2.000
2 6 2 3.000
3 30 8 3.750
4 210 48 4.375
5 2310 480 4.813
6 30030 5760 5.214
7 510510 92160 5.539
>>>

Conclusion: Factorizing with indirect indexing can pay
off at 210 (2, 3, 5, 7) in Python.

In other words: If we limit the factorization to
something reasonable (6 primes), we can reach a speed
gain of about 4, which is twice as fast as timsieve2.
I doubt that this is worth the effort.

tissieve6 is indeed faster that timsieve2 at a ratio of 2:3 .
If easy indexing can be found for 30 (2, 3, 5 factored out),
a slight improvement will be possible.

Here exactly comes the need for a compiler, which makes indexing
much cheaper than list construction. But again, not much more
that a gain of 5.2 can be expected, and we are at the limits
of this algorithm.

> + I don't know a faster method, but have heard such exist <challenging
> wink>. In Real Life, it can pay to "bunch up" a number of primes before
> doing the cross-out business, then cross out multiple multiples in adjacent
> blocks whose size is determined by platform cache characteristics. But,
> again, the code to manage that incurs too much overhead in Python -- besides
> which, it's insane.
>
> professional-courtesy-to-leave-*something*-for-fortran-ly y'rs - tim

If you are adressing my factorization, I think I have shown
that even Fortran would not help so much.
But I'm not sure if I understand the algorithm which
you are referring to.

What we did so far reaches my limits of knowledge about
primes. I'm sure that there are people who know more
tricks to make numbers confess if they are prime.

> import math
>
> def sieve(n):
> if n < 2:
> return []
> limit = int(math.sqrt(n))
> primes = range(1, n+1, 2)
> primes[0] = 0
> n = len(primes)
> for p in primes:
> if not p: continue
> if p <= limit:
> # note that p*p is odd, so no need to subtract 1
> # before shifting right -- same thing in the end
> for i in xrange((p*p)>>1, n, p):
> primes[i] = 0
> continue
> break
> primes[0] = 2
> return filter(None, primes)

Well, here is what I ended up with (and will not continue :)

def tissieve6(n):
if n < 2: return []
candidates = range(-1, n+1, 3)
# yes they are wrong, adjusted later.
sq = math.sqrt(n)
end = sys.maxint
candidates[:2] = [0, 0]
for p in candidates:
if p:
if p%2==0: p=p-1
if p>sq : break
dist = p2 = p+p
if p%3 == 1: dist=dist+dist
dist = dist/3
try: # indexing too far
for q in xrange( (p*p+3)/3, end, p2):\
candidates[q] = 0 ;\
candidates[q+dist] = 0
except:pass
candidates[:2] = [2, 3]
candidates = filter(None, candidates)
for i in xrange(1, len(candidates)):
if candidates[i]%2 == 0:\
candidates[i] = candidates[i]-1
return candidates

yielding the expected timing (n=10**6):

tissieve6 78498 2.075
timsieve2 78498 3.126

primely yours - chris