Here's a very Pythonic (IMHO) implementation that keeps going and going and
going ...:
from itertools import count
from math import sqrt
def prime_gen():
"""
Generate all prime numbers
"""
primes = []
for n in count(2):
if all(n%p for p in primes if p < sqrt(n)):
primes.append(n)
yield n
The use of all() is particularly nifty (see
http://code.activestate.com/recipes/576640/). And so is the way in which the
list comprehension easily incorporates the sqrt(n) optimization.
Question: Is there a way to implement this algorithm using generator
expressions only -- no "yield" statements allowed?
BTW -- thank you, John Machin, for the spanking ('I'd worry about "correct"
before "Pythonic"') in a previous message. I *did* manage to post a
correction to the needless conversion of a string to a list:
b in list("\r\n\t")
... a few hours before your message arrived (Wed Apr 1 19:44:01 CEST 2009)!
E-mail message checked by Spyware Doctor (6.0.0.386)
Database version: 5.12110
http://www.pctools.com/en/spyware-doctor-antivirus/
p <= sqrt(n) works a little better :^)
-Mark
>> p <= sqrt(n) works a little better :^)
>>
>> -Mark
>>
Right you are -- I found that bug in my last-minute check, and then I forgot
to trannscribe the fix into the email message. Duh -- thanks!
-John
No. You refer to the list being build in the code for building the list
(very cute), which requires that the list be bound to a name at the
start of the process rather than just when complete (which is never ;-).
tjr
def prime_gen():
primes = []
return (primes.append(n) or n for n in count(2) if all(n%p for p
in primes if p<=sqrt(n)))
That version is only marginally faster than your original version.
The biggest performance penalty is that the (... for p in primes ...)
generator isn't aborted once p > sqrt(n). Here's a less nifty but much
more efficient version:
def prime_gen():
prime_list = [2]
for p in prime_list: yield p
for n in itertools.count(prime_list[-1] + 1):
for p in prime_list:
if p * p > n:
prime_list.append(n)
yield n
break
elif n % p == 0:
break
else:
raise Exception("Shouldn't have run out of primes!")
When generating the first 1000 primes, this version's approximately 20
times faster; for the first 10,000 primes, ~80x (but still much slower
than a simple Sieve of Eratosthenes). To make repeated calls faster,
move prime_list = [2] outside the function.
-Miles
P.S. Gmail shows all your messages with a blank body and a text
attachment containing your message; this is perhaps because your
mailer includes an invalid blank Content-disposition header.
Yes. Avoiding the yield statement is easy but one might eventually end
up with two statements because one has to produce a side effect on the
primes list. However we can use default parameters in lambdas and
finally get a single expression which is a generator expression:
g = (lambda primes = []:
(n for n in count(2) \
if
(lambda n, primes: (n in primes if primes and n<=primes
[-1] \
else
(primes.append(n) or True \
if all(n%p for p in primes if p <= sqrt(n)) \
else False)))(n, primes)))()
assert g.next() == 2
assert g.next() == 3
assert g.next() == 5
assert g.next() == 7
> def prime_gen():
> primes = []
> return (primes.append(n) or n for n in count(2) if all(n%p for p
> in primes if p<=sqrt(n)))
>
> That version is only marginally faster than your original version. The
> biggest performance penalty is that the (... for p in primes ...)
> generator isn't aborted once p > sqrt(n).
That's a big performance penalty in practice. For n=100, it means that
ninety percent of the loop iterations are pointless; for n = 10,000 it
means that 99% of the loops are pointless; asymptotically, it means that
almost all the work done is wasted.
http://en.wikipedia.org/wiki/Almost_all
Consequently, retrieving the next prime becomes slower and slower as you
iterate over the generator. By my tests, it is about 40 times slower to
go from the 2000th prime to the 2001st prime (17389 and 17393) than it is
to go from the sixth prime to the seventh (13 and 17).
In other words, this is a crappy way to generate primes. It's a trick,
not a good algorithm: slow and inefficient, as well as obfuscated.
> Here's a less nifty but much more efficient version:
>
> def prime_gen():
> prime_list = [2]
> for p in prime_list: yield p
> for n in itertools.count(prime_list[-1] + 1):
> for p in prime_list:
> if p * p > n:
> prime_list.append(n)
> yield n
> break
> elif n % p == 0:
> break
> else:
> raise Exception("Shouldn't have run out of primes!")
That's pretty sweet, but we can make it even faster. We can speed things
up by incrementing by two instead of one, to avoid pointlessly testing
even numbers that we know must fail. We can also speed things up a tad by
making the common test first, and by storing squares rather than
recalculating them each time through the loop.
def prime_gen2():
yield 2
n = 3
yield n
prime_list = [(2, 4), (3, 9)]
it = itertools.count(4)
for n in it:
n = it.next()
for p,p2 in prime_list:
if n % p == 0:
break
elif p2 > n:
prime_list.append((n, n*n))
yield n
break
else:
raise RuntimeError("Shouldn't have run out of primes!")
Generating the first million primes is 10% faster using this compared to
your improved version.
--
Steven
> g = (lambda primes = []:
> (n for n in count(2) \
> if
> (lambda n, primes: (n in primes if primes and
n<=primes[-1] \
> else
> (primes.append(n) or True \
> if all(n%p for p in primes if p <= sqrt(n)) \
> else False)))(n, primes)))()
Um ... did you say "easy"? :-) This is prodigious, and it's definitely a
solution to the generator-expression-only challenge I laid down. But I
think this is a case in which using a generator expression causes a loss
in "Pythonicity", rather than a gain. Many, many thanks for this!
Steven D'Aprano said:
> In other words, this is a crappy way to generate primes. It's a trick,
> not a good algorithm: slow and inefficient, as well as obfuscated.
I wasn't thinking at all about efficiency, so I'm not surprised that my
implementation -- a pretty simple adaptation of
http://code.activestate.com/recipes/576640 -- scores low in that
dimension. I was mainly looking for Pythonicity. I'm sure this means
different things to different people, PEP 20 notwithstanding. To me,
this is an important aspect of Pythonicity:
The final Python code looks like your initial back-of-the-napkin pseudo
code.
In the old days, that meant using filter(), because this function
mirrors the "sieve" concept in the Sieve of Eratosthenes. These days, I
believe that this condition:
if all(n%p for p in primes if p < sqrt(n)):
... closely resembles this pseudo code:
if the number cannot be evenly divided by all the primes we've
generated so far (and we can stop testing at the square root)
So I believe this algorithm is the *opposite* of obfuscated. It's
crystal clear (albeit a slowpoke).
Miles (and others) said:
> P.S. Gmail shows all your messages with a blank body and a text
> attachment containing your message; this is perhaps because your
> mailer includes an invalid blank Content-disposition header.
It might be my ISP (AT&T Yahoo), or it might be MS-Outlook. I'm sending
this message using Thunderbird 2.0.0.21. Any better?
-John
That's because it is *one* expression. The avoidance of named
functions makes it look obfuscated or prodigious. Once it is properly
dissected it doesn't look that amazing anymore.
Start with:
(n for n in count(2) if is_prime(n, primes))
The is_prime function has following implementation:
def is_prime(n, primes):
if primes and n<=primes[-1]:
return n in primes
else:
if all(n%p for p in primes if p <= sqrt(n)):
primes.append(n)
return True
else:
return False
But we need a lambda-fied function expression instead of the function
definition. This yields:
is_prime = lambda n, primes: (n in primes \
if primes and n<=primes[-1] \
else (primes.append(n) or True \
if all(n%p for p in primes if p <= sqrt(n)) \
else False))
Finally the trick with primes definition within an expression by means
of an optional argument in the lambda is applied.
> But I
> think this is a case in which using a generator expression causes a loss
> in "Pythonicity", rather than a gain. Many, many thanks for this!
I think so too. The solution with the simple generator expression and
the fully defined is_prime function may be just adequate.
Start with:
> That's because it is *one* expression. The avoidance of named
> functions makes it look obfuscated or prodigious. Once it is properly
> dissected it doesn't look that amazing anymore.
>
> Start with:
>
> (n for n in count(2) if is_prime(n, primes))
>
> The is_prime function has following implementation:
>
> def is_prime(n, primes):
> if primes and n<=primes[-1]:
> return n in primes
> else:
> if all(n%p for p in primes if p <= sqrt(n)):
> primes.append(n)
> return True
> else:
> return False
Your explication is excellent, Kay! In the is_prime() function, can't we
omit the first three lines (if ... else), because it will *always* be
the case that "n" exceeds all the primes we've gathered so far. I've
tested the code with these lines omitted -- both with the separate
is_prime() function and with the generator-expression-only
implementation. It seems to work fine. Ex:
---
from itertools import count
from math import sqrt
g = (lambda primes = []:
(n for n in count(2) if
(lambda x, primes:
(primes.append(x) or True
if all(x%p for p in primes if p <= sqrt(x))
else False)
)(n, primes)
)
)()
for i in range(500):
print g.next(),
---
Note that we don't need any backslashes, either.
Yes, sure. I developed is_prime as a standalone function with the
primes list as a cache and just lambda-fied it.
> I've
> tested the code with these lines omitted -- both with the separate
> is_prime() function and with the generator-expression-only
> implementation. It seems to work fine. Ex:
>
> ---
> from itertools import count
> from math import sqrt
>
> g = (lambda primes = []:
> (n for n in count(2) if
> (lambda x, primes:
> (primes.append(x) or True
> if all(x%p for p in primes if p <= sqrt(x))
> else False)
> )(n, primes)
> )
> )()
This is of course much more accessible and doesn't even look weird.
Regards
Thanks! I think the only weirdness remaining is:
primes.append(x) or True
... which is a trick to force a True value after primes.append(x)
returns None.
BTW, for extra cleanliness, the parentheses can be omitted around lines
4-7, the inner lambda expression.
Here is another speedup (from my prime pair hunt days):
Check only 2 of every 6 integers.
[6N + 0, 6N + 2, 6N + 4] are divisible by 2, [6N + 0, 6N + 3] by 3.
def prime_gen3():
check_list = [(5, 25)]
yield 2
yield 3
yield 5
n = 5
for incr in itertools.cycle((2, 4)):
n += incr
for p, p_squared in check_list:
if p_squared > n:
check_list.append((n, n * n))
yield n
break
elif n % p == 0:
break
else:
raise Exception("Shouldn't have run out of primes!")
--Scott David Daniels
Scott....@Acm.Org
> Inspired by recent threads (and recalling my first message to Python
> edu-sig), I did some Internet searching on producing prime numbers using
> Python generators. Most algorithms I found don't go for the infinite,
> contenting themselves with "list all the primes below a given number".
>
> Here's a very Pythonic (IMHO) implementation that keeps going and
>going and going ...:
>
> from itertools import count
> from math import sqrt
>
> def prime_gen():
> """
> Generate all prime numbers
> """
> primes = []
> for n in count(2):
> if all(n%p for p in primes if p < sqrt(n)):
> primes.append(n)
> yield n
You could do something like this with the help of itertools.ifilter:
prime_gen = ifilter(
lambda n, P=[]: all(n%p for p in P) and not P.append(n),
count(2)
)
I omitted the 'if p <= sqrt(n)' as it doesn't help the generator go
faster. You could use itertools.takewhile for that:
prime_gen = ifilter(
lambda n, P=[]:
all(n%p for p in takewhile(lambda p, s=n**0.5: p<=s, P))
and not P.append(n),
count(2)
)
Of course there is no excuse for writing Python like that :)
--
Arnaud
Do know what in the itertools implementation causes adding a 'if p <= sqrt(n)'
clause to *decrease* performance, while adding a 'takewhile()' clause *increases* performance?
> Of course there is no excuse for writing Python like that :)
+1 (but it's fun every once in a while, eh?)
Thanks a lot! -John
> Do know what in the itertools implementation causes adding a 'if p <=
> sqrt(n)' clause to *decrease* performance, while adding a
> 'takewhile()' clause *increases* performance?
I haven't timed it, but I would guess that the takewhile was faster
only because the sqrt(n) had been factored out of the loop. Try the
original loop again precalculating the sqrt(n) and see how that compares.
Most likely
--
Arnaud
**** all prev. primes
3.97347791658
**** prev. primes that don't exceed SQRT
6.23250413579
**** [CACHED] prev. primes that don't exceed SQRT
3.45557325672
**** [TAKEWHILE] prev. primes that don't exceed SQRT
0.371197373201
**** [TAKEWHILE] squares, using *, of prev. primes that don't exceed N
0.358001074011
**** [TAKEWHILE] squares, using **, of prev. primes that don't exceed N
0.383540147515
**** [TAKEWHILE, CACHED] squares, using *, of prev. primes that don't
exceed N
0.410309506343
**** [TAKEWHILE, CACHED] squares, using **, of prev. primes that don't
exceed N
0.401269222462
So ... adding the SQRT optimization degrades the performance
significantly, but adding CACHING to the SQRT optimization swings the
performance pendulum back to the "benefit" side. TAKEWHILE is a big win,
but adding CACHING to TAKEWHILE degrades performance.
The code (110 lines) is at http://cl1p.net/python_prime_generators/
-John
Which of course is rubbish, extracting the sdqrt will have an effect but
the main factor is that takewhile exits the loop as soon as the condition
is false whereas a conditional in a generator comprehension doesn't stop
the loop continuing to the end.
Absolutely! Since you hadn't quoted the original code, I'd forgotten
that it wasn't stopping after n**0.5.
--
Arnaud
-1 entertaining, list. Come on, people, the show must go on. Take
five.