Re: Why is my code faster with append() in a loop than with a large list?

David

unread,

Jul 5, 2009, 8:48:39 PM7/5/09

to Xavier Ho, pytho...@python.org

Xavier Ho wrote:
> (Here's a short version of the long version below if you don't want to
> read:)
>
> Why is version B of the code faster than version A? (Only three lines
> different)
>
> Version A: http://pastebin.com/f14561243
> Version B: http://pastebin.com/f1f657afc
I don't know but here is the diff for someone that may;

1c1,2
< # This one only took 8 seconds on my machine. Wow?
---
>
> # This one took hours to compute, overnight.
28c29
< powers = [0, 0]
---
> powers = [0] * (num + 1)
32c33
< powers[factor-1] += 1
---
> powers[factor] += 1
35d35
< powers.append(0)
55c55
< n += 1
\ No newline at end of file
---
> n += 1

--
Powered by Gentoo GNU/Linux
http://linuxcrazy.com

MRAB

unread,

Jul 5, 2009, 8:54:39 PM7/5/09

to pytho...@python.org

Xavier Ho wrote:
> (Here's a short version of the long version below if you don't want to
> read:)
>
> Why is version B of the code faster than version A? (Only three lines
> different)
>
> Version A: http://pastebin.com/f14561243
> Version B: http://pastebin.com/f1f657afc
>

> ------------------------------------------------
>
> I was doing the problems on Project Euler for practice with Python last
> night. Problem 12 was to find the value of the first triangular number
> that has over 500 divisors.
> =========================================================================================
>
> The sequence of triangle numbers is generated by adding the natural
> numbers. So the 7^(^th ) triangle number would be 1 + 2 + 3 + 4 + 5 + 6
> + 7 = 28. The first ten terms would be:
>
> 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
>
> Let us list the factors of the first seven triangle numbers:
>
> * 1*: 1
> * 3*: 1,3
> * 6*: 1,2,3,6
> *10*: 1,2,5,10
> *15*: 1,3,5,15
> *21*: 1,3,7,21
> *28*: 1,2,4,7,14,28
>
> We can see that 28 is the first triangle number to have over five divisors.
>
> What is the value of the first triangle number to have over five hundred
> divisors?
>
> =========================================================================================
>
> My initial code was to loop through from 1 to half the number and see
> which were divisors, and as I find them I store them in a list. That
> would have taken days.
>
> My second try was factorising the number each time, and count the
> divisors using the powers of each factor, plus 1, and multiply together.
> The code is here (Version A): http://pastebin.com/f14561243
>
> This worked, but it took overnight to compute. Before I went to bed a
> friend of mine caught me online, and apparently left me a working
> version under 8 seconds with only 3 line difference.
> The code is here (Version B): http://pastebin.com/f1f657afc
>
> That was amazing. But I have no idea why his edit makes it so much
> faster. I did a test to see whether if append() was faster (which I
> doubted) than defining a list with a large size to begin with, and I was
> right:
> http://pastebin.com/f4b46d0db
> Which shows that appending is 40x slower, and was expected. But I still
> can't puzzle out why his use of appending in Version B was so much
> faster than mine.
>
> Any insights would be welcome. I'm going on a family trip, though, so my
> replies may delay.
>
In your version you're creating a list of (num + 1) elements, but in the
other version the list is only as long as the largest factor.

For example, for num=28, your version creates a list 29 elements long,
but the other version creates one only 7 elements long. Also, 'the time
needed to append an item to the list is "amortized constant"' (quoted
from http://effbot.org/zone/python-list.htm).

This means that your little speed test isn't representation of what's
actually happening.

Vilya Harvey

unread,

Jul 6, 2009, 5:21:07 AM7/6/09

to Xavier Ho, pytho...@python.org

2009/7/6 Xavier Ho <con...@xavierho.com>:

> Why is version B of the code faster than version A? (Only three lines
> different)

Here's a guess:

As the number you're testing gets larger, version A is creating very
big list. I'm not sure exactly how much overhead each list entry has
in python, but I guess it's at least 8 bytes: a 32-bit reference for
each list entry, and 32 bits to hold the int value (assuming a 32-bit
version of python). The solution you're looking for is a large 8 digit
number; let's say 80,000,000, for the sake of easy calculation. That
means, as you get close to the solution, you'll be trying to allocate
almost 640 Mb of memory for every number you're checking. That's going
to make the garbage collector work extremely hard. Also, depending on
how much memory your computer has free, you'll probably start hitting
virtual memory too, which will slow you down even further. Finally,
the reduce step has to process all 80,000,000 elements which is
clearly going to take a while.

Version b creates a list which is only as long as the largest prime
factor, so at worst the list size will be approx. sqrt(80,000,000),
which is approx. 8900 elements or approx. 72 Kb or memory - a much
more manageable size.

Hope that helps,

Vil.

Dave Angel

unread,

Jul 6, 2009, 6:09:46 AM7/6/09

to Xavier Ho, pytho...@python.org

Xavier Ho wrote:
> (Here's a short version of the long version below if you don't want to
> read:)
>

> Why is version B of the code faster than version A? (Only three lines
> different)
>

> Version A: http://pastebin.com/f14561243
> Version B: http://pastebin.com/f1f657afc
>
> ------------------------------------------------
>
> I was doing the problems on Project Euler for practice with Python last
> night. Problem 12 was to find the value of the first triangular number that
> has over 500 divisors.
> =========================================================================================
>
> The sequence of triangle numbers is generated by adding the natural numbers.

> So the 7[image: ^(]th[image: )] triangle number would be 1 + 2 + 3 + 4 + 5 +

> Best regards,
>
> Ching-Yun "Xavier" Ho, Technical Artist
>
> Contact Information
> Mobile: (+61) 04 3335 4748
> Skype ID: SpaXe85
> Email: con...@xavierho.com
> Website: http://xavierho.com/
>
>
Just by inspection, it would seem the bottleneck in your first version
is that you return a huge list of nearly all zeroes, from factorize().
This slows down countDivisors() a great deal.

It would probably save some time to not bother storing the zeroes in the
list at all. And it should help if you were to step through a list of
primes, rather than trying every possible int. Or at least constrain
yourself to odd numbers (after the initial case of 2).

DaveA

MRAB

unread,

Jul 6, 2009, 10:22:23 AM7/6/09

to pytho...@python.org

Dave Angel wrote:
[snip]

> It would probably save some time to not bother storing the zeroes in the
> list at all. And it should help if you were to step through a list of
> primes, rather than trying every possible int. Or at least constrain
> yourself to odd numbers (after the initial case of 2).
>

Or stop looking for more factors when you've passed the square root of
num. I don't know what effect there'll be on the time if you recalculate
the square root when num changes (expensive calculation vs smaller
search space).

Piet van Oostrum

unread,

Jul 6, 2009, 11:15:47 AM7/6/09

to

>>>>> Dave Angel <da...@dejaviewphoto.com> (DA) wrote:

>DA> It would probably save some time to not bother storing the zeroes in the
>DA> list at all. And it should help if you were to step through a list of
>DA> primes, rather than trying every possible int. Or at least constrain
>DA> yourself to odd numbers (after the initial case of 2).

The first and the last save a constant factor (slightly less than 2):

def factorise(num):
"""Returns a list of prime factor powers. For example:
factorise(6) will return
[2, 2] (the powers are returned one higher than the actual value)
as in, 2^1 * 3^1 = 6."""
powers = []
factor = 2
while num > 1:
power = 0
while num % factor == 0:
power += 1
num /= factor
if power > 0:
powers.append(power+1)
factor += 1
return powers

...
return reduce(mul, powers)

or to skip the odd factors:

def factorise(num):
"""Returns a list of prime factor powers. For example:
factorise(6) will return
[2, 2] (the powers are returned one higher than the actual value)
as in, 2^1 * 3^1 = 6."""
powers = []
factor = 2
while num > 1:
power = 0
while num % factor == 0:
power += 1
num /= factor
if power > 0:
powers.append(power+1)
factor = 3 if factor == 2 else factor + 2
return powers

This can be slightly optimised by taking factor 2 out of the loop.

def factorise(num):
"""Returns a list of prime factor powers. For example:
factorise(6) will return
[2, 2] (the powers are returned one higher than the actual value)
as in, 2^1 * 3^1 = 6."""
powers = []
power = 0
while num % 2 == 0:
power += 1
num /= 2
if power > 0:
powers.append(power+1)
factor = 3
while num > 1:
power = 0
while num % factor == 0:
power += 1
num /= factor
if power > 0:
powers.append(power+1)
factor += 2
return powers

To restrict the search to primes you would have to use a
sieve of Eratosthenes or something similar.
My first attempt (with a sieve from
http://code.activestate.com/recipes/117119/) only gave a speed decrease!!
But this had the sieve recreated for every triangle number. A global
sieve that is reused at each triangle number is better. But the speed
increase relative to the odd factors only is not dramatical.

# Based upon http://code.activestate.com/recipes/117119/

D = {9: 6} # contains composite numbers
Dlist = [2, 3] # list of already generated primes

def sieve():
'''generator that yields all prime numbers'''
global D
global Dlist
for p in Dlist:
yield p
q = Dlist[-1]+2
while True:
if q in D:
p = D[q]
x = q + p
while x in D: x += p
D[x] = p
else:
Dlist.append(q)
yield q
D[q*q] = 2*q
q += 2

def factorise(num):
"""Returns a list of prime factor powers. For example:
factorise(6) will return
[2, 2] (the powers are returned one higher than the actual value)
as in, 2^1 * 3^1 = 6."""
powers = []
power = 0
for factor in sieve():
power = 0
while num % factor == 0:
power += 1
num /= factor
if power > 0:
# if you really want the factors then append((factor, power))
powers.append(power+1)
if num == 1:
break
return powers

--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Piet van Oostrum

unread,

Jul 6, 2009, 11:51:02 AM7/6/09

to

Sorry, there was an error in the sieve in my last example. Here is a
corrected version:

D = {9: 6} # contains composite numbers
Dlist = [2, 3] # list of already generated primes
def sieve():
'''generator that yields all prime numbers'''
global D
global Dlist

for q in Dlist:
yield q
while True:
q += 2
p = D.pop(q, 0)
if p:

x = q + p
while x in D: x += p
D[x] = p
else:
Dlist.append(q)

D[q*q] = 2*q
yield q

Scott David Daniels

unread,

Jul 6, 2009, 12:53:34 PM7/6/09

to

Piet van Oostrum wrote:
>>>>>> Dave Angel <da...@dejaviewphoto.com> (DA) wrote:
>
>> DA> It would probably save some time to not bother storing the zeroes in the
>> DA> list at all. And it should help if you were to step through a list of
>> DA> primes, rather than trying every possible int. Or at least constrain
>> DA> yourself to odd numbers (after the initial case of 2).
>

> ...

> # Based upon http://code.activestate.com/recipes/117119/
>
> D = {9: 6} # contains composite numbers

XXX Dlist = [2, 3] # list of already generated primes
Elist = [(2, 4), (3, 9)] # list of primes and their squares
>
XXX def sieve():
XXX '''generator that yields all prime numbers'''
XXX global D
XXX global Dlist
def sieve2():
'''generator that yields all primes and their squares'''
# No need for global declarations, we alter, not replace
XXX for p in Dlist:
XXX yield p
XXX q = Dlist[-1]+2

for pair in Elist:
yield pair
q = pair[0] + 2

> while True:
> if q in D:
> p = D[q]
> x = q + p
> while x in D: x += p
> D[x] = p
> else:

XXX Dlist.append(q)
XXX yield q
XXX D[q*q] = 2*q
square = q * q
pair = q, square
Elist.append(pair)
yield pair
D[square] = 2 * q

> q += 2
>
> def factorise(num):
> """Returns a list of prime factor powers. For example:
> factorise(6) will return
> [2, 2] (the powers are returned one higher than the actual value)
> as in, 2^1 * 3^1 = 6."""
> powers = []
> power = 0

XXX for factor in sieve():
for factor, limit in sieve2():

> power = 0
> while num % factor == 0:
> power += 1
> num /= factor

XXX if power > 0:
if power: # good enough here, and faster

> # if you really want the factors then append((factor, power))
> powers.append(power+1)

XXX if num == 1:
XXX break
XXX return powers
if num < limit:
if num > 1:
# if you really want the factors then append((num, 1))
powers.append(2)
return powers

OK, that's a straightforward speedup, _but_:
factorize(6) == [2, 2] == factorize(10) == factorize(15)
So I am not sure exactly what you are calculating.

--Scott David Daniels
Scott....@Acm.Org

Dave Angel

unread,

Jul 6, 2009, 12:49:49 PM7/6/09

to MRAB, pytho...@python.org

MRAB wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave
> Angel wrote:
> [snip]

>> It would probably save some time to not bother storing the zeroes in

>> the list at all. And it should help if you were to step through a
>> list of primes, rather than trying every possible int. Or at least
>> constrain yourself to odd numbers (after the initial case of 2).
>>
> Or stop looking for more factors when you've passed the square root of
> num. I don't know what effect there'll be on the time if you recalculate
> the square root when num changes (expensive calculation vs smaller
> search space).
>

> </div>
>
But if I remember the code, it stopped when the quotient is one, which
is usually sooner than the square root. And no need to precalculate the
square root.

Dave Angel

unread,

Jul 6, 2009, 1:13:48 PM7/6/09

to Scott David Daniels, pytho...@python.org

Scott David Daniels wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Piet van

> </div>
>
The OP only needed the number of factors, not the actual factors. So
the zeroes in the list are unneeded. 6, 10, and 15 each have 4 factors.

MRAB

unread,

Jul 6, 2009, 1:20:59 PM7/6/09

to pytho...@python.org

Dave Angel wrote:
> MRAB wrote:
>> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave
>> Angel wrote:
>> [snip]

>>> It would probably save some time to not bother storing the zeroes in

>>> the list at all. And it should help if you were to step through a
>>> list of primes, rather than trying every possible int. Or at least
>>> constrain yourself to odd numbers (after the initial case of 2).
>>>
>> Or stop looking for more factors when you've passed the square root of
>> num. I don't know what effect there'll be on the time if you recalculate
>> the square root when num changes (expensive calculation vs smaller
>> search space).
>>
>> </div>
>>
> But if I remember the code, it stopped when the quotient is one, which
> is usually sooner than the square root. And no need to precalculate the
> square root.
>

If the number is a large prime then the code will try all the numbers up
to that, eg if num == 1000003 then it'll try 2..1000003 even though it
could've given up after 1000.

Dave Angel

unread,

Jul 6, 2009, 2:28:05 PM7/6/09

to MRAB, pytho...@python.org

> </div>
>
That's one reason I suggested (and Piet implemented) a sieve. You can
stop dividing when the square of the next prime exceeds your quotient.

Piet van Oostrum

unread,

Jul 6, 2009, 6:51:24 PM7/6/09

to

>>>>> Scott David Daniels <Scott....@Acm.Org> (SDD) wrote:

>SDD> # No need for global declarations, we alter, not replace

Yes, I know, but I find it neater to do the global declarations, if only
for documentation. And they don't affect the run time, only the compile
time.

Piet van Oostrum

unread,

Jul 6, 2009, 6:58:00 PM7/6/09

to

>>>>> Dave Angel <da...@ieee.org> (DA) wrote:

>DA> MRAB wrote:
>>> <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave Angel
>>> wrote:
>>> [snip]
>>>> It would probably save some time to not bother storing the zeroes in the
>>>> list at all. And it should help if you were to step through a list of
>>>> primes, rather than trying every possible int. Or at least constrain
>>>> yourself to odd numbers (after the initial case of 2).
>>>>
>>> Or stop looking for more factors when you've passed the square root of
>>> num. I don't know what effect there'll be on the time if you recalculate
>>> the square root when num changes (expensive calculation vs smaller
>>> search space).
>>>
>>> </div>
>>>

>DA> But if I remember the code, it stopped when the quotient is one, which is
>DA> usually sooner than the square root. And no need to precalculate the
>DA> square root.

That's right. I thought of doing the sqrt trick by testing for num <
factor, i.e."

if num < factor:
break
but found out that it is useless, as for each factor found, num is
divided by that factor as many times as possible. Therefore when the
largest prime factor of num has been found, num ends up being 1 and the
loop stops.