Inspired by recent threads (and recalling my first message to Python edu-sig), I did some Internet searching on producing prime numbers using Python generators. Most algorithms I found don't go for the infinite, contenting themselves with "list all the primes below a given number".
Here's a very Pythonic (IMHO) implementation that keeps going and going and going ...:
from itertools import count from math import sqrt
def prime_gen(): """ Generate all prime numbers """ primes = [] for n in count(2): if all(n%p for p in primes if p < sqrt(n)): primes.append(n) yield n
The use of all() is particularly nifty (see http://code.activestate.com/recipes/576640/). And so is the way in which the list comprehension easily incorporates the sqrt(n) optimization.
Question: Is there a way to implement this algorithm using generator expressions only -- no "yield" statements allowed?
BTW -- thank you, John Machin, for the spanking ('I'd worry about "correct" before "Pythonic"') in a previous message. I *did* manage to post a correction to the needless conversion of a string to a list:
b in list("\r\n\t")
... a few hours before your message arrived (Wed Apr 1 19:44:01 CEST 2009)!
> Inspired by recent threads (and recalling my first message to Python > edu-sig), I did some Internet searching on producing prime numbers using > Python generators. Most algorithms I found don't go for the infinite, > contenting themselves with "list all the primes below a given number".
> Here's a very Pythonic (IMHO) implementation that keeps going and going > and > going ...:
> from itertools import count > from math import sqrt
> def prime_gen(): > """ > Generate all prime numbers > """ > primes = [] > for n in count(2): > if all(n%p for p in primes if p < sqrt(n)): > primes.append(n) > yield n
John Posner wrote: > Inspired by recent threads (and recalling my first message to Python > edu-sig), I did some Internet searching on producing prime numbers using > Python generators. Most algorithms I found don't go for the infinite, > contenting themselves with "list all the primes below a given number".
> Here's a very Pythonic (IMHO) implementation that keeps going and going and > going ...:
> from itertools import count > from math import sqrt
> def prime_gen(): > """ > Generate all prime numbers > """ > primes = [] > for n in count(2): > if all(n%p for p in primes if p < sqrt(n)): > primes.append(n) > yield n
> The use of all() is particularly nifty (see > http://code.activestate.com/recipes/576640/). And so is the way in which the > list comprehension easily incorporates the sqrt(n) optimization.
> Question: Is there a way to implement this algorithm using generator > expressions only -- no "yield" statements allowed?
No. You refer to the list being build in the code for building the list (very cute), which requires that the list be bound to a name at the start of the process rather than just when complete (which is never ;-).
On Sat, Apr 4, 2009 at 2:50 PM, John Posner wrote: > Inspired by recent threads (and recalling my first message to Python > edu-sig), I did some Internet searching on producing prime numbers using > Python generators. Most algorithms I found don't go for the infinite, > contenting themselves with "list all the primes below a given number".
> Here's a very Pythonic (IMHO) implementation that keeps going and going and > going ...:
> from itertools import count > from math import sqrt
> def prime_gen(): > """ > Generate all prime numbers > """ > primes = [] > for n in count(2): > if all(n%p for p in primes if p < sqrt(n)): > primes.append(n) > yield n
> The use of all() is particularly nifty (see > http://code.activestate.com/recipes/576640/). And so is the way in which the > list comprehension easily incorporates the sqrt(n) optimization.
> Question: Is there a way to implement this algorithm using generator > expressions only -- no "yield" statements allowed?
def prime_gen(): primes = [] return (primes.append(n) or n for n in count(2) if all(n%p for p in primes if p<=sqrt(n)))
That version is only marginally faster than your original version. The biggest performance penalty is that the (... for p in primes ...) generator isn't aborted once p > sqrt(n). Here's a less nifty but much more efficient version:
def prime_gen(): prime_list = [2] for p in prime_list: yield p for n in itertools.count(prime_list[-1] + 1): for p in prime_list: if p * p > n: prime_list.append(n) yield n break elif n % p == 0: break else: raise Exception("Shouldn't have run out of primes!")
When generating the first 1000 primes, this version's approximately 20 times faster; for the first 10,000 primes, ~80x (but still much slower than a simple Sieve of Eratosthenes). To make repeated calls faster, move prime_list = [2] outside the function.
-Miles
P.S. Gmail shows all your messages with a blank body and a text attachment containing your message; this is perhaps because your mailer includes an invalid blank Content-disposition header.
> Question: Is there a way to implement this algorithm using generator > expressions only -- no "yield" statements allowed?
Yes. Avoiding the yield statement is easy but one might eventually end up with two statements because one has to produce a side effect on the primes list. However we can use default parameters in lambdas and finally get a single expression which is a generator expression:
g = (lambda primes = []: (n for n in count(2) \ if (lambda n, primes: (n in primes if primes and n<=primes [-1] \ else (primes.append(n) or True \ if all(n%p for p in primes if p <= sqrt(n)) \ else False)))(n, primes)))()
On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote: > def prime_gen(): > primes = [] > return (primes.append(n) or n for n in count(2) if all(n%p for p > in primes if p<=sqrt(n)))
> That version is only marginally faster than your original version. The > biggest performance penalty is that the (... for p in primes ...) > generator isn't aborted once p > sqrt(n).
That's a big performance penalty in practice. For n=100, it means that ninety percent of the loop iterations are pointless; for n = 10,000 it means that 99% of the loops are pointless; asymptotically, it means that almost all the work done is wasted.
Consequently, retrieving the next prime becomes slower and slower as you iterate over the generator. By my tests, it is about 40 times slower to go from the 2000th prime to the 2001st prime (17389 and 17393) than it is to go from the sixth prime to the seventh (13 and 17).
In other words, this is a crappy way to generate primes. It's a trick, not a good algorithm: slow and inefficient, as well as obfuscated.
> Here's a less nifty but much more efficient version:
> def prime_gen(): > prime_list = [2] > for p in prime_list: yield p > for n in itertools.count(prime_list[-1] + 1): > for p in prime_list: > if p * p > n: > prime_list.append(n) > yield n > break > elif n % p == 0: > break > else: > raise Exception("Shouldn't have run out of primes!")
That's pretty sweet, but we can make it even faster. We can speed things up by incrementing by two instead of one, to avoid pointlessly testing even numbers that we know must fail. We can also speed things up a tad by making the common test first, and by storing squares rather than recalculating them each time through the loop.
def prime_gen2(): yield 2 n = 3 yield n prime_list = [(2, 4), (3, 9)] it = itertools.count(4) for n in it: n = it.next() for p,p2 in prime_list: if n % p == 0: break elif p2 > n: prime_list.append((n, n*n)) yield n break else: raise RuntimeError("Shouldn't have run out of primes!")
Generating the first million primes is 10% faster using this compared to your improved version.
> g = (lambda primes = []: > (n for n in count(2) \ > if > (lambda n, primes: (n in primes if primes and n<=primes[-1] \ > else > (primes.append(n) or True \ > if all(n%p for p in primes if p <= sqrt(n)) \ > else False)))(n, primes)))()
Um ... did you say "easy"? :-) This is prodigious, and it's definitely a solution to the generator-expression-only challenge I laid down. But I think this is a case in which using a generator expression causes a loss in "Pythonicity", rather than a gain. Many, many thanks for this!
Steven D'Aprano said:
> In other words, this is a crappy way to generate primes. It's a trick, > not a good algorithm: slow and inefficient, as well as obfuscated.
I wasn't thinking at all about efficiency, so I'm not surprised that my implementation -- a pretty simple adaptation of http://code.activestate.com/recipes/576640 -- scores low in that dimension. I was mainly looking for Pythonicity. I'm sure this means different things to different people, PEP 20 notwithstanding. To me, this is an important aspect of Pythonicity:
The final Python code looks like your initial back-of-the-napkin pseudo code.
In the old days, that meant using filter(), because this function mirrors the "sieve" concept in the Sieve of Eratosthenes. These days, I believe that this condition:
if all(n%p for p in primes if p < sqrt(n)):
... closely resembles this pseudo code:
if the number cannot be evenly divided by all the primes we've generated so far (and we can stop testing at the square root)
So I believe this algorithm is the *opposite* of obfuscated. It's crystal clear (albeit a slowpoke).
Miles (and others) said:
> P.S. Gmail shows all your messages with a blank body and a text > attachment containing your message; this is perhaps because your > mailer includes an invalid blank Content-disposition header.
It might be my ISP (AT&T Yahoo), or it might be MS-Outlook. I'm sending this message using Thunderbird 2.0.0.21. Any better?
On 5 Apr., 17:14, John Posner <jjpos...@snet.net> wrote:
> Kay Schluehr said:
> > g = (lambda primes = []: > > (n for n in count(2) \ > > if > > (lambda n, primes: (n in primes if primes and > n<=primes[-1] \ > > else > > (primes.append(n) or True \ > > if all(n%p for p in primes if p <= sqrt(n)) \ > > else False)))(n, primes)))()
> Um ... did you say "easy"? :-) This is prodigious, and it's definitely a > solution to the generator-expression-only challenge I laid down.
That's because it is *one* expression. The avoidance of named functions makes it look obfuscated or prodigious. Once it is properly dissected it doesn't look that amazing anymore.
Start with:
(n for n in count(2) if is_prime(n, primes))
The is_prime function has following implementation:
def is_prime(n, primes): if primes and n<=primes[-1]: return n in primes else: if all(n%p for p in primes if p <= sqrt(n)): primes.append(n) return True else: return False
But we need a lambda-fied function expression instead of the function definition. This yields:
is_prime = lambda n, primes: (n in primes \ if primes and n<=primes[-1] \ else (primes.append(n) or True \ if all(n%p for p in primes if p <= sqrt(n)) \ else False))
Finally the trick with primes definition within an expression by means of an optional argument in the lambda is applied.
> But I > think this is a case in which using a generator expression causes a loss > in "Pythonicity", rather than a gain. Many, many thanks for this!
I think so too. The solution with the simple generator expression and the fully defined is_prime function may be just adequate.
[mailto:python-list-bounces+nick=stinemates....@python.org] On Behalf Of Kay Schluehr Sent: Sunday, April 05, 2009 8:37 AM To: python-l...@python.org Subject: Re: Generators/iterators, Pythonicity, and primes
On 5 Apr., 17:14, John Posner <jjpos...@snet.net> wrote: > Kay Schluehr said:
> > g = (lambda primes = []: > > (n for n in count(2) \ > > if > > (lambda n, primes: (n in primes if primes and > n<=primes[-1] \ > > else > > (primes.append(n) or True \ > > if all(n%p for p in primes if p <= sqrt(n)) \ > > else False)))(n, primes)))()
> Um ... did you say "easy"? :-) This is prodigious, and it's definitely a > solution to the generator-expression-only challenge I laid down.
That's because it is *one* expression. The avoidance of named functions makes it look obfuscated or prodigious. Once it is properly dissected it doesn't look that amazing anymore.
Start with:
(n for n in count(2) if is_prime(n, primes))
The is_prime function has following implementation:
def is_prime(n, primes): if primes and n<=primes[-1]: return n in primes else: if all(n%p for p in primes if p <= sqrt(n)): primes.append(n) return True else: return False
But we need a lambda-fied function expression instead of the function definition. This yields:
is_prime = lambda n, primes: (n in primes \ if primes and n<=primes[-1] \ else (primes.append(n) or True \ if all(n%p for p in primes if p <= sqrt(n)) \ else False))
Finally the trick with primes definition within an expression by means of an optional argument in the lambda is applied.
> But I > think this is a case in which using a generator expression causes a loss > in "Pythonicity", rather than a gain. Many, many thanks for this!
> That's because it is *one* expression. The avoidance of named > functions makes it look obfuscated or prodigious. Once it is properly > dissected it doesn't look that amazing anymore. > > Start with: > > (n for n in count(2) if is_prime(n, primes)) > > The is_prime function has following implementation: > > def is_prime(n, primes): > if primes and n<=primes[-1]: > return n in primes > else: > if all(n%p for p in primes if p <= sqrt(n)): > primes.append(n) > return True > else: > return False
Your explication is excellent, Kay! In the is_prime() function, can't we omit the first three lines (if ... else), because it will *always* be the case that "n" exceeds all the primes we've gathered so far. I've tested the code with these lines omitted -- both with the separate is_prime() function and with the generator-expression-only implementation. It seems to work fine. Ex:
--- from itertools import count from math import sqrt
g = (lambda primes = []: (n for n in count(2) if (lambda x, primes: (primes.append(x) or True if all(x%p for p in primes if p <= sqrt(x)) else False) )(n, primes) ) )()
> > That's because it is *one* expression. The avoidance of named > > functions makes it look obfuscated or prodigious. Once it is properly > > dissected it doesn't look that amazing anymore.
> > Start with:
> > (n for n in count(2) if is_prime(n, primes))
> > The is_prime function has following implementation:
> > def is_prime(n, primes): > > if primes and n<=primes[-1]: > > return n in primes > > else: > > if all(n%p for p in primes if p <= sqrt(n)): > > primes.append(n) > > return True > > else: > > return False
> Your explication is excellent, Kay! In the is_prime() function, can't we > omit the first three lines (if ... else), because it will *always* be > the case that "n" exceeds all the primes we've gathered so far.
Yes, sure. I developed is_prime as a standalone function with the primes list as a cache and just lambda-fied it.
> I've > tested the code with these lines omitted -- both with the separate > is_prime() function and with the generator-expression-only > implementation. It seems to work fine. Ex:
> --- > from itertools import count > from math import sqrt
> g = (lambda primes = []: > (n for n in count(2) if > (lambda x, primes: > (primes.append(x) or True > if all(x%p for p in primes if p <= sqrt(x)) > else False) > )(n, primes) > ) > )()
This is of course much more accessible and doesn't even look weird.
>> > g = (lambda primes = []: >> > (n for n in count(2) if >> > (lambda x, primes: >> > (primes.append(x) or True >> > if all(x%p for p in primes if p <= sqrt(x)) >> > else False) >> > )(n, primes) >> > ) >> > )() >> >> This is of course much more accessible and doesn't even look weird. >>
Thanks! I think the only weirdness remaining is:
primes.append(x) or True
... which is a trick to force a True value after primes.append(x) returns None.
BTW, for extra cleanliness, the parentheses can be omitted around lines 4-7, the inner lambda expression.
Steven D'Aprano wrote: > On Sun, 05 Apr 2009 00:28:17 -0400, Miles wrote: >> def prime_gen(): ... > That's pretty sweet, but we can make it even faster. We can speed things > up by incrementing by two instead of one, to avoid pointlessly testing > even numbers that we know must fail. We can also speed things up a tad by > making the common test first, and by storing squares rather than > recalculating them each time through the loop.
> def prime_gen2(): > yield 2 > n = 3 > yield n > prime_list = [(2, 4), (3, 9)] > it = itertools.count(4) > for n in it: > n = it.next() > for p,p2 in prime_list: > if n % p == 0: > break > elif p2 > n: > prime_list.append((n, n*n)) > yield n > break > else: > raise RuntimeError("Shouldn't have run out of primes!")
Here is another speedup (from my prime pair hunt days): Check only 2 of every 6 integers. [6N + 0, 6N + 2, 6N + 4] are divisible by 2, [6N + 0, 6N + 3] by 3.
def prime_gen3(): check_list = [(5, 25)] yield 2 yield 3 yield 5 n = 5 for incr in itertools.cycle((2, 4)): n += incr for p, p_squared in check_list: if p_squared > n: check_list.append((n, n * n)) yield n break elif n % p == 0: break else: raise Exception("Shouldn't have run out of primes!")
John Posner <jjpos...@snet.net> writes: > Inspired by recent threads (and recalling my first message to Python > edu-sig), I did some Internet searching on producing prime numbers using > Python generators. Most algorithms I found don't go for the infinite, > contenting themselves with "list all the primes below a given number".
> Here's a very Pythonic (IMHO) implementation that keeps going and >going and going ...:
> from itertools import count > from math import sqrt
> def prime_gen(): > """ > Generate all prime numbers > """ > primes = [] > for n in count(2): > if all(n%p for p in primes if p < sqrt(n)): > primes.append(n) > yield n
You could do something like this with the help of itertools.ifilter:
prime_gen = ifilter( lambda n, P=[]: all(n%p for p in P) and not P.append(n), count(2) )
I omitted the 'if p <= sqrt(n)' as it doesn't help the generator go faster. You could use itertools.takewhile for that:
prime_gen = ifilter( lambda n, P=[]: all(n%p for p in takewhile(lambda p, s=n**0.5: p<=s, P)) and not P.append(n), count(2) )
Of course there is no excuse for writing Python like that :)
Arnaud Delobelle wrote: > You could do something like this with the help of itertools.ifilter:
> prime_gen = ifilter( > lambda n, P=[]: all(n%p for p in P) and not P.append(n), > count(2) > )
Formidable! (both the English and French meanings) This is the most elegant, Sieve of Eratosthenes-like solution.
> I omitted the 'if p <= sqrt(n)' as it doesn't help the generator go > faster. You could use itertools.takewhile for that:
> prime_gen = ifilter( > lambda n, P=[]: > all(n%p for p in takewhile(lambda p, s=n**0.5: p<=s, P)) > and not P.append(n), > count(2) > )
Do know what in the itertools implementation causes adding a 'if p <= sqrt(n)' clause to *decrease* performance, while adding a 'takewhile()' clause *increases* performance?
> Of course there is no excuse for writing Python like that :)
John Posner <jjpos...@snet.net> wrote: > Do know what in the itertools implementation causes adding a 'if p <= > sqrt(n)' clause to *decrease* performance, while adding a > 'takewhile()' clause *increases* performance?
I haven't timed it, but I would guess that the takewhile was faster only because the sqrt(n) had been factored out of the loop. Try the original loop again precalculating the sqrt(n) and see how that compares.
Duncan Booth <duncan.bo...@invalid.invalid> writes: > John Posner <jjpos...@snet.net> wrote:
>> Do know what in the itertools implementation causes adding a 'if p <= >> sqrt(n)' clause to *decrease* performance, while adding a >> 'takewhile()' clause *increases* performance?
> I haven't timed it, but I would guess that the takewhile was faster > only because the sqrt(n) had been factored out of the loop. Try the > original loop again precalculating the sqrt(n) and see how that compares.
Duncan Booth wrote: > John Posner <jjpos...@snet.net> wrote:
>> Do know what in the itertools implementation causes adding a 'if p <= >> sqrt(n)' clause to *decrease* performance, while adding a >> 'takewhile()' clause *increases* performance?
> I haven't timed it, but I would guess that the takewhile was faster > only because the sqrt(n) had been factored out of the loop. Try the > original loop again precalculating the sqrt(n) and see how that compares.
Here's my "timeit" scorecard, with repeat=5 (only the minimum value reported) and number=5000:
**** all prev. primes 3.97347791658
**** prev. primes that don't exceed SQRT 6.23250413579
**** [CACHED] prev. primes that don't exceed SQRT 3.45557325672
**** [TAKEWHILE] prev. primes that don't exceed SQRT 0.371197373201
**** [TAKEWHILE] squares, using *, of prev. primes that don't exceed N 0.358001074011
**** [TAKEWHILE] squares, using **, of prev. primes that don't exceed N 0.383540147515
**** [TAKEWHILE, CACHED] squares, using *, of prev. primes that don't exceed N 0.410309506343
**** [TAKEWHILE, CACHED] squares, using **, of prev. primes that don't exceed N 0.401269222462
So ... adding the SQRT optimization degrades the performance significantly, but adding CACHING to the SQRT optimization swings the performance pendulum back to the "benefit" side. TAKEWHILE is a big win, but adding CACHING to TAKEWHILE degrades performance.
Duncan Booth <duncan.bo...@invalid.invalid> wrote: > John Posner <jjpos...@snet.net> wrote:
>> Do know what in the itertools implementation causes adding a 'if p <= >> sqrt(n)' clause to *decrease* performance, while adding a >> 'takewhile()' clause *increases* performance?
> I haven't timed it, but I would guess that the takewhile was faster > only because the sqrt(n) had been factored out of the loop. Try the > original loop again precalculating the sqrt(n) and see how that compares.
Which of course is rubbish, extracting the sdqrt will have an effect but the main factor is that takewhile exits the loop as soon as the condition is false whereas a conditional in a generator comprehension doesn't stop the loop continuing to the end.
>>> Do know what in the itertools implementation causes adding a 'if p <= >>> sqrt(n)' clause to *decrease* performance, while adding a >>> 'takewhile()' clause *increases* performance?
>> I haven't timed it, but I would guess that the takewhile was faster >> only because the sqrt(n) had been factored out of the loop. Try the >> original loop again precalculating the sqrt(n) and see how that compares.
> Which of course is rubbish, extracting the sdqrt will have an effect but > the main factor is that takewhile exits the loop as soon as the condition > is false whereas a conditional in a generator comprehension doesn't stop > the loop continuing to the end.
Absolutely! Since you hadn't quoted the original code, I'd forgotten that it wasn't stopping after n**0.5.
> >>> Do know what in the itertools implementation causes adding a 'if p <= > >>> sqrt(n)' clause to *decrease* performance, while adding a > >>> 'takewhile()' clause *increases* performance?
> >> I haven't timed it, but I would guess that the takewhile was faster > >> only because the sqrt(n) had been factored out of the loop. Try the > >> original loop again precalculating the sqrt(n) and see how that compares.
> > Which of course is rubbish, extracting the sdqrt will have an effect but > > the main factor is that takewhile exits the loop as soon as the condition > > is false whereas a conditional in a generator comprehension doesn't stop > > the loop continuing to the end.
> Absolutely! Since you hadn't quoted the original code, I'd forgotten > that it wasn't stopping after n**0.5.
> -- > Arnaud
-1 entertaining, list. Come on, people, the show must go on. Take five.