Generators/iterators, Pythonicity, and primes

John Posner

unread,

Apr 4, 2009, 2:50:29 PM4/4/09

to pytho...@python.org

Inspired by recent threads (and recalling my first message to Python
edu-sig), I did some Internet searching on producing prime numbers using
Python generators. Most algorithms I found don't go for the infinite,
contenting themselves with "list all the primes below a given number".

Here's a very Pythonic (IMHO) implementation that keeps going and going and
going ...:

from itertools import count
from math import sqrt

def prime_gen():
"""
Generate all prime numbers
"""
primes = []
for n in count(2):
if all(n%p for p in primes if p < sqrt(n)):
primes.append(n)
yield n

The use of all() is particularly nifty (see
http://code.activestate.com/recipes/576640/). And so is the way in which the
list comprehension easily incorporates the sqrt(n) optimization.

Question: Is there a way to implement this algorithm using generator
expressions only -- no "yield" statements allowed?

BTW -- thank you, John Machin, for the spanking ('I'd worry about "correct"
before "Pythonic"') in a previous message. I *did* manage to post a
correction to the needless conversion of a string to a list:

b in list("\r\n\t")

... a few hours before your message arrived (Wed Apr 1 19:44:01 CEST 2009)!

E-mail message checked by Spyware Doctor (6.0.0.386)
Database version: 5.12110
http://www.pctools.com/en/spyware-doctor-antivirus/

Mark Tolonen

unread,

Apr 4, 2009, 3:06:57 PM4/4/09

to pytho...@python.org

"John Posner" <jjpo...@snet.net> wrote in message
news:AF9FBCC3A7624599A6F51BAD2397EA46@AMDUP...

> Inspired by recent threads (and recalling my first message to Python
> edu-sig), I did some Internet searching on producing prime numbers using
> Python generators. Most algorithms I found don't go for the infinite,
> contenting themselves with "list all the primes below a given number".
>
> Here's a very Pythonic (IMHO) implementation that keeps going and going
> and
> going ...:
>
> from itertools import count
> from math import sqrt
>
> def prime_gen():
> """
> Generate all prime numbers
> """
> primes = []
> for n in count(2):
> if all(n%p for p in primes if p < sqrt(n)):
> primes.append(n)
> yield n

p <= sqrt(n) works a little better :^)

-Mark

John Posner

unread,

Apr 4, 2009, 3:28:47 PM4/4/09

to Mark Tolonen, pytho...@python.org

Mark Tolonen said:

>> p <= sqrt(n) works a little better :^)
>>
>> -Mark
>>

Right you are -- I found that bug in my last-minute check, and then I forgot
to trannscribe the fix into the email message. Duh -- thanks!

-John

Terry Reedy

unread,

Apr 4, 2009, 10:53:08 PM4/4/09

to pytho...@python.org

John Posner wrote:
> Inspired by recent threads (and recalling my first message to Python
> edu-sig), I did some Internet searching on producing prime numbers using
> Python generators. Most algorithms I found don't go for the infinite,
> contenting themselves with "list all the primes below a given number".
>
> Here's a very Pythonic (IMHO) implementation that keeps going and going and
> going ...:
>
> from itertools import count
> from math import sqrt
>
> def prime_gen():
> """
> Generate all prime numbers
> """
> primes = []
> for n in count(2):
> if all(n%p for p in primes if p < sqrt(n)):
> primes.append(n)
> yield n
>

> The use of all() is particularly nifty (see
> http://code.activestate.com/recipes/576640/). And so is the way in which the
> list comprehension easily incorporates the sqrt(n) optimization.
>
> Question: Is there a way to implement this algorithm using generator
> expressions only -- no "yield" statements allowed?

No. You refer to the list being build in the code for building the list
(very cute), which requires that the list be bound to a name at the
start of the process rather than just when complete (which is never ;-).

tjr

Miles

unread,

Apr 5, 2009, 12:28:17 AM4/5/09

to python

On Sat, Apr 4, 2009 at 2:50 PM, John Posner wrote:
> Inspired by recent threads (and recalling my first message to Python
> edu-sig), I did some Internet searching on producing prime numbers using
> Python generators. Most algorithms I found don't go for the infinite,
> contenting themselves with "list all the primes below a given number".
>
> Here's a very Pythonic (IMHO) implementation that keeps going and going and
> going ...:
>
> from itertools import count
> from math import sqrt
>
> def prime_gen():
> """
> Generate all prime numbers
> """
> primes = []
> for n in count(2):
> if all(n%p for p in primes if p < sqrt(n)):
> primes.append(n)
> yield n
>
> The use of all() is particularly nifty (see
> http://code.activestate.com/recipes/576640/). And so is the way in which the
> list comprehension easily incorporates the sqrt(n) optimization.
>
> Question: Is there a way to implement this algorithm using generator
> expressions only -- no "yield" statements allowed?

def prime_gen():
primes = []
return (primes.append(n) or n for n in count(2) if all(n%p for p
in primes if p<=sqrt(n)))

That version is only marginally faster than your original version.
The biggest performance penalty is that the (... for p in primes ...)
generator isn't aborted once p > sqrt(n). Here's a less nifty but much
more efficient version:

def prime_gen():
prime_list = [2]
for p in prime_list: yield p
for n in itertools.count(prime_list[-1] + 1):
for p in prime_list:
if p * p > n:
prime_list.append(n)
yield n
break
elif n % p == 0:
break
else:
raise Exception("Shouldn't have run out of primes!")

When generating the first 1000 primes, this version's approximately 20
times faster; for the first 10,000 primes, ~80x (but still much slower
than a simple Sieve of Eratosthenes). To make repeated calls faster,
move prime_list = [2] outside the function.

-Miles

P.S. Gmail shows all your messages with a blank body and a text
attachment containing your message; this is perhaps because your
mailer includes an invalid blank Content-disposition header.

Kay Schluehr

unread,

Apr 5, 2009, 1:43:00 AM4/5/09

to

> Question: Is there a way to implement this algorithm using generator
> expressions only -- no "yield" statements allowed?

Yes. Avoiding the yield statement is easy but one might eventually end
up with two statements because one has to produce a side effect on the
primes list. However we can use default parameters in lambdas and
finally get a single expression which is a generator expression:

g = (lambda primes = []:
(n for n in count(2) \
if
(lambda n, primes: (n in primes if primes and n<=primes
[-1] \
else
(primes.append(n) or True \
if all(n%p for p in primes if p <= sqrt(n)) \
else False)))(n, primes)))()

assert g.next() == 2
assert g.next() == 3
assert g.next() == 5
assert g.next() == 7

Steven D'Aprano

unread,

Apr 5, 2009, 7:24:44 AM4/5/09

to

On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote:

> def prime_gen():
> primes = []
> return (primes.append(n) or n for n in count(2) if all(n%p for p
> in primes if p<=sqrt(n)))
>
> That version is only marginally faster than your original version. The
> biggest performance penalty is that the (... for p in primes ...)
> generator isn't aborted once p > sqrt(n).

That's a big performance penalty in practice. For n=100, it means that
ninety percent of the loop iterations are pointless; for n = 10,000 it
means that 99% of the loops are pointless; asymptotically, it means that
almost all the work done is wasted.

http://en.wikipedia.org/wiki/Almost_all

Consequently, retrieving the next prime becomes slower and slower as you
iterate over the generator. By my tests, it is about 40 times slower to
go from the 2000th prime to the 2001st prime (17389 and 17393) than it is
to go from the sixth prime to the seventh (13 and 17).

In other words, this is a crappy way to generate primes. It's a trick,
not a good algorithm: slow and inefficient, as well as obfuscated.

> Here's a less nifty but much more efficient version:
>
> def prime_gen():
> prime_list = [2]
> for p in prime_list: yield p
> for n in itertools.count(prime_list[-1] + 1):
> for p in prime_list:
> if p * p > n:
> prime_list.append(n)
> yield n
> break
> elif n % p == 0:
> break
> else:
> raise Exception("Shouldn't have run out of primes!")

That's pretty sweet, but we can make it even faster. We can speed things
up by incrementing by two instead of one, to avoid pointlessly testing
even numbers that we know must fail. We can also speed things up a tad by
making the common test first, and by storing squares rather than
recalculating them each time through the loop.

def prime_gen2():
yield 2
n = 3
yield n
prime_list = [(2, 4), (3, 9)]
it = itertools.count(4)
for n in it:
n = it.next()
for p,p2 in prime_list:

if n % p == 0:
break

elif p2 > n:
prime_list.append((n, n*n))
yield n
break
else:
raise RuntimeError("Shouldn't have run out of primes!")

Generating the first million primes is 10% faster using this compared to
your improved version.

--
Steven

John Posner

unread,

Apr 5, 2009, 11:14:09 AM4/5/09

to st...@cybersource.com.au, pytho...@python.org, seman...@gmail.com, kay.sc...@gmx.net

Kay Schluehr said:

> g = (lambda primes = []:

> (n for n in count(2) \
> if

> (lambda n, primes: (n in primes if primes and
n<=primes[-1] \
> else
> (primes.append(n) or True \
> if all(n%p for p in primes if p <= sqrt(n)) \
> else False)))(n, primes)))()

Um ... did you say "easy"? :-) This is prodigious, and it's definitely a
solution to the generator-expression-only challenge I laid down. But I
think this is a case in which using a generator expression causes a loss
in "Pythonicity", rather than a gain. Many, many thanks for this!

Steven D'Aprano said:

> In other words, this is a crappy way to generate primes. It's a trick,
> not a good algorithm: slow and inefficient, as well as obfuscated.

I wasn't thinking at all about efficiency, so I'm not surprised that my
implementation -- a pretty simple adaptation of
http://code.activestate.com/recipes/576640 -- scores low in that
dimension. I was mainly looking for Pythonicity. I'm sure this means
different things to different people, PEP 20 notwithstanding. To me,
this is an important aspect of Pythonicity:

The final Python code looks like your initial back-of-the-napkin pseudo
code.

In the old days, that meant using filter(), because this function
mirrors the "sieve" concept in the Sieve of Eratosthenes. These days, I
believe that this condition:

if all(n%p for p in primes if p < sqrt(n)):

... closely resembles this pseudo code:

if the number cannot be evenly divided by all the primes we've
generated so far (and we can stop testing at the square root)

So I believe this algorithm is the *opposite* of obfuscated. It's
crystal clear (albeit a slowpoke).

Miles (and others) said:

> P.S. Gmail shows all your messages with a blank body and a text
> attachment containing your message; this is perhaps because your
> mailer includes an invalid blank Content-disposition header.

It might be my ISP (AT&T Yahoo), or it might be MS-Outlook. I'm sending
this message using Thunderbird 2.0.0.21. Any better?

-John

Kay Schluehr

unread,

Apr 5, 2009, 11:37:20 AM4/5/09

to

On 5 Apr., 17:14, John Posner <jjpos...@snet.net> wrote:
> Kay Schluehr said:
>
> > g = (lambda primes = []:
> > (n for n in count(2) \
> > if
> > (lambda n, primes: (n in primes if primes and
> n<=primes[-1] \
> > else
> > (primes.append(n) or True \
> > if all(n%p for p in primes if p <= sqrt(n)) \
> > else False)))(n, primes)))()
>
> Um ... did you say "easy"? :-) This is prodigious, and it's definitely a
> solution to the generator-expression-only challenge I laid down.

That's because it is *one* expression. The avoidance of named
functions makes it look obfuscated or prodigious. Once it is properly
dissected it doesn't look that amazing anymore.

Start with:

(n for n in count(2) if is_prime(n, primes))

The is_prime function has following implementation:

def is_prime(n, primes):
if primes and n<=primes[-1]:
return n in primes
else:
if all(n%p for p in primes if p <= sqrt(n)):
primes.append(n)
return True
else:
return False

But we need a lambda-fied function expression instead of the function
definition. This yields:

is_prime = lambda n, primes: (n in primes \

if primes and n<=primes[-1] \
else (primes.append(n) or True \
if all(n%p for p in primes if p <= sqrt(n)) \
else False))

Finally the trick with primes definition within an expression by means
of an optional argument in the lambda is applied.

> But I
> think this is a case in which using a generator expression causes a loss
> in "Pythonicity", rather than a gain. Many, many thanks for this!

I think so too. The solution with the simple generator expression and
the fully defined is_prime function may be just adequate.

Nick Stinemates

unread,

Apr 5, 2009, 12:40:57 PM4/5/09

to pytho...@python.org

I thought it was beautiful. Reminded me of lisp and I haven't seen that done
in Python before.

Start with:

--
http://mail.python.org/mailman/listinfo/python-list

John Posner

unread,

Apr 5, 2009, 12:47:29 PM4/5/09

to kay.sc...@gmx.net, pytho...@python.org, kirby urner

Kay Schluehr wrote:

> That's because it is *one* expression. The avoidance of named
> functions makes it look obfuscated or prodigious. Once it is properly
> dissected it doesn't look that amazing anymore.
>
> Start with:
>
> (n for n in count(2) if is_prime(n, primes))
>
> The is_prime function has following implementation:
>
> def is_prime(n, primes):
> if primes and n<=primes[-1]:
> return n in primes
> else:
> if all(n%p for p in primes if p <= sqrt(n)):
> primes.append(n)
> return True
> else:
> return False

Your explication is excellent, Kay! In the is_prime() function, can't we
omit the first three lines (if ... else), because it will *always* be
the case that "n" exceeds all the primes we've gathered so far. I've
tested the code with these lines omitted -- both with the separate
is_prime() function and with the generator-expression-only
implementation. It seems to work fine. Ex:

---

from itertools import count
from math import sqrt

g = (lambda primes = []:

(n for n in count(2) if

(lambda x, primes:
(primes.append(x) or True
if all(x%p for p in primes if p <= sqrt(x))
else False)
)(n, primes)
)
)()

for i in range(500):
print g.next(),
---

Note that we don't need any backslashes, either.

Kay Schluehr

unread,

Apr 5, 2009, 12:57:31 PM4/5/09

to

On 5 Apr., 18:47, John Posner <jjpos...@snet.net> wrote:
> Kay Schluehr wrote:
>
> > That's because it is *one* expression. The avoidance of named
> > functions makes it look obfuscated or prodigious. Once it is properly
> > dissected it doesn't look that amazing anymore.
> >
> > Start with:
> >
> > (n for n in count(2) if is_prime(n, primes))
> >
> > The is_prime function has following implementation:
> >
> > def is_prime(n, primes):
> > if primes and n<=primes[-1]:
> > return n in primes
> > else:
> > if all(n%p for p in primes if p <= sqrt(n)):
> > primes.append(n)
> > return True
> > else:
> > return False
>
> Your explication is excellent, Kay! In the is_prime() function, can't we
> omit the first three lines (if ... else), because it will *always* be
> the case that "n" exceeds all the primes we've gathered so far.

Yes, sure. I developed is_prime as a standalone function with the
primes list as a cache and just lambda-fied it.

> I've
> tested the code with these lines omitted -- both with the separate
> is_prime() function and with the generator-expression-only
> implementation. It seems to work fine. Ex:
>
> ---
> from itertools import count
> from math import sqrt
>
> g = (lambda primes = []:
> (n for n in count(2) if
> (lambda x, primes:
> (primes.append(x) or True
> if all(x%p for p in primes if p <= sqrt(x))
> else False)
> )(n, primes)
> )
> )()

This is of course much more accessible and doesn't even look weird.

Regards

John Posner

unread,

Apr 5, 2009, 1:59:45 PM4/5/09

to kay.sc...@gmx.net, pytho...@python.org

>> > g = (lambda primes = []:
>> > (n for n in count(2) if
>> > (lambda x, primes:
>> > (primes.append(x) or True
>> > if all(x%p for p in primes if p <= sqrt(x))
>> > else False)
>> > )(n, primes)
>> > )
>> > )()
>>
>> This is of course much more accessible and doesn't even look weird.
>>

Thanks! I think the only weirdness remaining is:

primes.append(x) or True

... which is a trick to force a True value after primes.append(x)
returns None.

BTW, for extra cleanliness, the parentheses can be omitted around lines
4-7, the inner lambda expression.

Scott David Daniels

unread,

Apr 5, 2009, 3:08:04 PM4/5/09

to

Steven D'Aprano wrote:
> On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote:

>> def prime_gen(): ...

> That's pretty sweet, but we can make it even faster. We can speed things
> up by incrementing by two instead of one, to avoid pointlessly testing
> even numbers that we know must fail. We can also speed things up a tad by
> making the common test first, and by storing squares rather than
> recalculating them each time through the loop.

Much as I had done independently

>
> def prime_gen2():
> yield 2
> n = 3
> yield n
> prime_list = [(2, 4), (3, 9)]
> it = itertools.count(4)
> for n in it:
> n = it.next()
> for p,p2 in prime_list:
> if n % p == 0:
> break
> elif p2 > n:
> prime_list.append((n, n*n))
> yield n
> break
> else:
> raise RuntimeError("Shouldn't have run out of primes!")

Here is another speedup (from my prime pair hunt days):
Check only 2 of every 6 integers.
[6N + 0, 6N + 2, 6N + 4] are divisible by 2, [6N + 0, 6N + 3] by 3.

def prime_gen3():
check_list = [(5, 25)]
yield 2
yield 3
yield 5
n = 5
for incr in itertools.cycle((2, 4)):
n += incr
for p, p_squared in check_list:
if p_squared > n:
check_list.append((n, n * n))

yield n
break
elif n % p == 0:
break
else:
raise Exception("Shouldn't have run out of primes!")

--Scott David Daniels
Scott....@Acm.Org

Arnaud Delobelle

unread,

Apr 11, 2009, 5:26:05 PM4/11/09

to

John Posner <jjpo...@snet.net> writes:

> Inspired by recent threads (and recalling my first message to Python
> edu-sig), I did some Internet searching on producing prime numbers using
> Python generators. Most algorithms I found don't go for the infinite,
> contenting themselves with "list all the primes below a given number".
>
> Here's a very Pythonic (IMHO) implementation that keeps going and
>going and going ...:
>
> from itertools import count
> from math import sqrt
>
> def prime_gen():
> """
> Generate all prime numbers
> """
> primes = []
> for n in count(2):
> if all(n%p for p in primes if p < sqrt(n)):
> primes.append(n)
> yield n

You could do something like this with the help of itertools.ifilter:

prime_gen = ifilter(
lambda n, P=[]: all(n%p for p in P) and not P.append(n),
count(2)
)

I omitted the 'if p <= sqrt(n)' as it doesn't help the generator go
faster. You could use itertools.takewhile for that:

prime_gen = ifilter(
lambda n, P=[]:
all(n%p for p in takewhile(lambda p, s=n**0.5: p<=s, P))
and not P.append(n),
count(2)
)

Of course there is no excuse for writing Python like that :)

--
Arnaud

John Posner

unread,

Apr 11, 2009, 9:21:18 PM4/11/09

to Arnaud Delobelle, pytho...@python.org

Arnaud Delobelle wrote:
> You could do something like this with the help of itertools.ifilter:
>
> prime_gen = ifilter(
> lambda n, P=[]: all(n%p for p in P) and not P.append(n),
> count(2)
> )
>

Formidable! (both the English and French meanings) This is the most
elegant, Sieve of Eratosthenes-like solution.

> I omitted the 'if p <= sqrt(n)' as it doesn't help the generator go
> faster. You could use itertools.takewhile for that:
>
> prime_gen = ifilter(
> lambda n, P=[]:
> all(n%p for p in takewhile(lambda p, s=n**0.5: p<=s, P))
> and not P.append(n),
> count(2)
> )
>

Do know what in the itertools implementation causes adding a 'if p <= sqrt(n)'
clause to *decrease* performance, while adding a 'takewhile()' clause *increases* performance?

> Of course there is no excuse for writing Python like that :)

+1 (but it's fun every once in a while, eh?)

Thanks a lot! -John

Duncan Booth

unread,

Apr 12, 2009, 8:27:25 AM4/12/09

to

John Posner <jjpo...@snet.net> wrote:

> Do know what in the itertools implementation causes adding a 'if p <=
> sqrt(n)' clause to *decrease* performance, while adding a
> 'takewhile()' clause *increases* performance?

I haven't timed it, but I would guess that the takewhile was faster
only because the sqrt(n) had been factored out of the loop. Try the
original loop again precalculating the sqrt(n) and see how that compares.

Arnaud Delobelle

unread,

Apr 12, 2009, 9:29:31 AM4/12/09

to

Duncan Booth <duncan...@invalid.invalid> writes:

Most likely

--
Arnaud

John Posner

unread,

Apr 12, 2009, 2:03:27 PM4/12/09

to duncan...@suttoncourtenay.org.uk, pytho...@python.org

Here's my "timeit" scorecard, with repeat=5 (only the minimum value
reported) and number=5000:

**** all prev. primes
3.97347791658

**** prev. primes that don't exceed SQRT
6.23250413579

**** [CACHED] prev. primes that don't exceed SQRT
3.45557325672

**** [TAKEWHILE] prev. primes that don't exceed SQRT
0.371197373201

**** [TAKEWHILE] squares, using *, of prev. primes that don't exceed N
0.358001074011

**** [TAKEWHILE] squares, using **, of prev. primes that don't exceed N
0.383540147515

**** [TAKEWHILE, CACHED] squares, using *, of prev. primes that don't
exceed N
0.410309506343

**** [TAKEWHILE, CACHED] squares, using **, of prev. primes that don't
exceed N
0.401269222462

So ... adding the SQRT optimization degrades the performance
significantly, but adding CACHING to the SQRT optimization swings the
performance pendulum back to the "benefit" side. TAKEWHILE is a big win,
but adding CACHING to TAKEWHILE degrades performance.

The code (110 lines) is at http://cl1p.net/python_prime_generators/

-John

Duncan Booth

unread,

Apr 12, 2009, 5:04:21 PM4/12/09

to

Duncan Booth <duncan...@invalid.invalid> wrote:

Which of course is rubbish, extracting the sdqrt will have an effect but
the main factor is that takewhile exits the loop as soon as the condition
is false whereas a conditional in a generator comprehension doesn't stop
the loop continuing to the end.

Arnaud Delobelle

unread,

Apr 13, 2009, 2:39:31 AM4/13/09

to

Duncan Booth <duncan...@invalid.invalid> writes:

Absolutely! Since you hadn't quoted the original code, I'd forgotten
that it wasn't stopping after n**0.5.

--
Arnaud

Aaron Brady

unread,

Apr 13, 2009, 7:52:40 AM4/13/09

to

On Apr 13, 1:39 am, Arnaud Delobelle <arno...@googlemail.com> wrote:
> Duncan Booth <duncan.bo...@invalid.invalid> writes:
> > Duncan Booth <duncan.bo...@invalid.invalid> wrote:

>
> >> John Posner <jjpos...@snet.net> wrote:
>
> >>> Do know what in the itertools implementation causes adding a 'if p <=
> >>> sqrt(n)' clause to *decrease* performance, while adding a
> >>> 'takewhile()' clause *increases* performance?
>
> >> I haven't timed it, but I would guess that the takewhile was faster
> >> only because the sqrt(n) had been factored out of the loop. Try the
> >> original loop again precalculating the sqrt(n) and see how that compares.
>
> > Which of course is rubbish, extracting the sdqrt will have an effect but
> > the main factor is that takewhile exits the loop as soon as the condition
> > is false whereas a conditional in a generator comprehension doesn't stop
> > the loop continuing to the end.
>
> Absolutely! Since you hadn't quoted the original code, I'd forgotten
> that it wasn't stopping after n**0.5.
>
> --
> Arnaud

-1 entertaining, list. Come on, people, the show must go on. Take
five.