can multi-core improve single funciton?

oyster

unread,

Feb 10, 2009, 1:28:15 AM2/10/09

to pytho...@python.org

I mean this
[code]
def fib(n):
if n<=1:
return 1
return fib(n-1)+fib(n-2)

useCore(1)
timeit(fib(500)) #this show 20 seconds

useCore(2)
timeit(fib(500)) #this show 10 seconds
[/code]

Is it possible?

and more, can threads/multi-processors/clusters be used to improve fib?

thanx

Chris Rebert

unread,

Feb 10, 2009, 1:43:50 AM2/10/09

to oyster, pytho...@python.org

Considering the GIL, I highly doubt it so. Also, the algorithm is
fundamentally linear -- each result depends on previous results -- so
I don't think you can easily parallelize it (if at all). You'd
probably get greater speedup using memoization.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com

Steven D'Aprano

unread,

Feb 10, 2009, 2:34:54 AM2/10/09

to

On Tue, 10 Feb 2009 14:28:15 +0800, oyster wrote:

> I mean this
> [code]
> def fib(n):
> if n<=1:
> return 1
> return fib(n-1)+fib(n-2)
>
> useCore(1)
> timeit(fib(500)) #this show 20 seconds
>
> useCore(2)
> timeit(fib(500)) #this show 10 seconds [/code]
>
> Is it possible?

What does useCore(n) mean?

> and more, can threads/multi-processors/clusters be used to improve fib?

No. The best way to improve fib is to improve fib, not to try running it
on faster hardware.

Your algorithm makes *many* recursive calls to fib() for each call:

fib(1) => 1 function call
fib(2) => 3 function calls
fib(3) => 5 function calls
fib(4) => 9 function calls
fib(5) => 15 function calls
fib(6) => 25 function calls
fib(7) => 41 function calls
fib(8) => 67 function calls
fib(9) => 109 function calls
...
fib(34) => 18454929 function calls
fib(35) => 29860703 function calls
fib(36) => 48315633 function calls
...
fib(498) => 1.723e+104 function calls
fib(499) => 2.788e+104 function calls
fib(500) => 4.511e+104 function calls

Calling fib(500) the way you do will make more than 450 thousand billion
billion billion billion billion billion billion billion billion billion
billion function calls. You claim that you calculated it in just 20
seconds. On my PC, it takes over a minute to calculate fib(38). I
estimate it would take over five hours to calculate fib(50), and fib(100)
would probably take 16 million years. I call shenanigans.

--
Steven

Paul McGuire

unread,

Feb 10, 2009, 2:38:01 AM2/10/09

to

On Feb 10, 12:43 am, Chris Rebert <c...@rebertia.com> wrote:
> Considering the GIL, I highly doubt it so. Also, the algorithm is
> fundamentally linear -- each result depends on previous results -- so
> I don't think you can easily parallelize it (if at all). You'd
> probably get greater speedup using memoization.
>

Even worse than linear, the function is recursive, which as I
understand it, is inherently a no-no when looking for code that is
parallel-friendly.

If you want multi-core to have some benefit, then your program must
have a way for separate parts of the problem to be processed
independently. Many matrix processing problems fall into this camp,
since the various cells of the matrix can be worked on separately from
all others. Prime number searchers break up a problem by having
separate processes scan different number ranges independently, or have
multiple processes perform rough pass prime filters, who write
possible primes to another queue to be more exhaustively checked.
This saves the expensive/exhaustive check from being done on every
number. Fractal graphics generators (Mandelbrot set plots) can break
up the graphics range among different processors, or keep a "queue" of
pixels to generate, and each calculator process pulls from the queue
until all pixels are computed (some pixels take longer to compute than
others).

-- Paul

Gerhard Weis

unread,

Feb 10, 2009, 3:18:23 AM2/10/09

to

Once I have seen Haskell code, that ran fibonacci on a 4 core system.

The algorithm itself basically used an extra granularity paramet until
which new threads will be sparked. e.g. fib(40, 36) means, calculate
fib(40) and spark threads until n=36.
1. divide: fib(n-1), fib(n-2)
2. divide: fib(n-2), fib(n-3)
3. divide: fib(n-3), fib(n-4)

We tried this code on a 4 core machine using only 1 core and all 4 cores.
1 core wallclock: ~10s
4 core wallclock: ~3s

So there is a signifcant speedup, but I guess, this only works with
Haskell out of the box.
I have not tried it, but I think you can simulate Haskell's behaviour,
by caching the results of fib(n).
(We tried the same algorithm in Java. While it utilized all 4 cores,
there was no speedup. This is why I think, that result caching is the
only way to speedup fibonacci.

cheers

Gerhard

On 2009-02-10 17:34:54 +1000, Steven D'Aprano

Steven D'Aprano

unread,

Feb 10, 2009, 3:45:37 AM2/10/09

to

On Tue, 10 Feb 2009 18:18:23 +1000, Gerhard Weis wrote:

> Once I have seen Haskell code, that ran fibonacci on a 4 core system.
>
> The algorithm itself basically used an extra granularity paramet until
> which new threads will be sparked. e.g. fib(40, 36) means, calculate
> fib(40) and spark threads until n=36. 1. divide: fib(n-1), fib(n-2)
> 2. divide: fib(n-2), fib(n-3)
> 3. divide: fib(n-3), fib(n-4)
>
> We tried this code on a 4 core machine using only 1 core and all 4
> cores. 1 core wallclock: ~10s
> 4 core wallclock: ~3s

Three seconds to calculate fib(40) is terrible. Well, it's a great saving
over ten seconds, but it's still horribly, horribly slow.

Check this out, on a two-core 2.2 GHz system:

>>> import timeit
>>> timeit.Timer('fib(40)', 'from __main__ import fib').timeit(1)
7.2956085205078125e-05

That's five orders of magnitude faster. How did I do it? I used a better
algorithm.

def fib(n, a=1, b=1):
if n==0:
return a
elif n==1:
return b
return fib(n-1, b, a+b)

And for the record:

>>> fib(500)
225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626L
>>> timeit.Timer('fib(500)', 'from __main__ import fib').timeit(1)
0.00083398818969726562

Less than a millisecond, versus millions of years for the OP's algorithm.
I know which I would choose. Faster hardware can only go so far in hiding
the effect of poor algorithms.

--
Steven

Gabriel Genellina

unread,

Feb 10, 2009, 4:32:35 AM2/10/09

to pytho...@python.org

En Tue, 10 Feb 2009 06:45:37 -0200, Steven D'Aprano
<ste...@remove.this.cybersource.com.au> escribió:

> On Tue, 10 Feb 2009 18:18:23 +1000, Gerhard Weis wrote:
>
>> We tried this code on a 4 core machine using only 1 core and all 4
>> cores. 1 core wallclock: ~10s
>> 4 core wallclock: ~3s
>
> Three seconds to calculate fib(40) is terrible. Well, it's a great saving
> over ten seconds, but it's still horribly, horribly slow.
>
> Check this out, on a two-core 2.2 GHz system:
>
>
>>>> import timeit
>>>> timeit.Timer('fib(40)', 'from __main__ import fib').timeit(1)
> 7.2956085205078125e-05
>
> That's five orders of magnitude faster. How did I do it? I used a better
> algorithm.
>
>
> def fib(n, a=1, b=1):
> if n==0:
> return a
> elif n==1:
> return b
> return fib(n-1, b, a+b)

I agree with your comment, but this is not the right way to write it in
Python. This style is fine in a language with tail-call optimization --
but fib(1000) raises `RuntimeError: maximum recursion depth exceeded`

The same thing after removing recursion can compute the 2089 digits of
fib(10000) without problems:

def fib(n):
a = b = 1
while n:
n, a, b = n-1, b, a+b
return a

> Less than a millisecond, versus millions of years for the OP's algorithm.
> I know which I would choose. Faster hardware can only go so far in hiding
> the effect of poor algorithms.

Completely agree!

--
Gabriel Genellina

Chris Rebert

unread,

Feb 10, 2009, 4:43:20 AM2/10/09

to Steven D'Aprano, pytho...@python.org

Indeed. And apparently memoization can hide it also:

$ #memoization added to OP's code
$ python -m timeit -s 'from memoized_naive_recursive_version import
fib' 'fib(40)'
1000000 loops, best of 3: 0.639 usec per loop
$ #your code
$ python -m timeit -s 'from your_version import fib' 'fib(40)'
100000 loops, best of 3: 15.4 usec per loop
$ cat memoized_naive_recursive_version.py
def memoize(func):
cache = {}
def memoized(*args):
if args in cache:
return cache[args]
else:
result = func(*args)
cache[args] = result
return result
return memoized

@memoize
def fib(n):
if n <= 1:
return 1
else:
return fib(n-1) + fib(n-2)

However, the naive recursive version fails by exceeding the maximum
recursion depth if N is too large, whereas your version continues to
succeed for a much higher limit of N.

Niklas Norrthon

unread,

Feb 10, 2009, 5:05:35 AM2/10/09

to

On 10 Feb, 09:45, Steven D'Aprano

According to the common definition of fibonacci numbers fib(0) = 0, fib
(1) = 1 and fib(n) = fib(n-1) + fib(n-2) for n > 1. So the number
above is fib(501).

>>> timeit.Timer('fib(500)', 'from __main__ import fib').timeit(1)
>
> 0.00083398818969726562

And now for my version (which admitedly isn't really mine, and returns
slightly incorrect fib(n) for large values of n, due to the limited
floating point precision).

>>> import math
>>> sqrt5 = math.sqrt(5)
>>> golden_section = (sqrt5 + 1) / 2
>>> def fib(n):
return int(golden_section ** n / sqrt5 + 0.5)

>>> fib(501)
225591516161940121919323945317755919750165306733355143970858461525064153963081278412888159063487104942080L

>>> timeit.Timer('fib(501)', 'from __main__ import fib').timeit(1)
1.9555558083084179e-05

>
> Less than a millisecond, versus millions of years for the OP's algorithm.
> I know which I would choose. Faster hardware can only go so far in hiding
> the effect of poor algorithms.
>

I agree 100%

/Niklas Norrthon

Gerhard Weis

unread,

Feb 10, 2009, 7:41:25 AM2/10/09

to

> def fib(n, a=1, b=1):
> if n==0:
> return a
> elif n==1:
> return b
> return fib(n-1, b, a+b)
>
>
> And for the record:
>
>>>> fib(500)
> 225591516161936330872512695036072072046011324913758190588638866418474627738686883405015987052796968498626L
timeit.Timer('fib(500)',
>
>>>> 'from __main__ import fib').timeit(1)
> 0.00083398818969726562
> Less than a millisecond, versus millions of years for the OP's algorithm.
> I know which I would choose. Faster hardware can only go so far in hiding
> the effect of poor algorithms.

I agree to 100% to this.

btw. the timeings are not that different for the naive recursion in
OP's version and yours.
fib(500) on my machine:
OP's: 0.00116 (far away from millions of years)
This here: 0.000583
Gabriel's while loop: 0.000246

The fastest algorithm I've found does fib(1000000) in .25seconds on my machine.
However, I have not found a way to parallelize this algorithm and I
think it is not possible.

To come back to the original question. Yes it is possible; however
there are quite some restrictions about it. In case of the naive
recursion it was possible to get a speedup of more than 3 times on a 4
core machine. With the highly optimised version mentioned above, I had
no success in utilizing more than one core.

As conclusion I would say, Yes, a single function can profit from
multiple cores, but sometimes a better algorithm delivers better
results than using more cores. However, it is possible that a slower
but parallelizable algorithm might outperform the best serial
algorithm, with enough cores :).

Gerhard

Lie Ryan

unread,

Feb 10, 2009, 7:43:20 AM2/10/09

to pytho...@python.org

On Tue, 10 Feb 2009 14:28:15 +0800, oyster wrote:

> I mean this
> [code]
> def fib(n):
> if n<=1:
> return 1
> return fib(n-1)+fib(n-2)
>
> useCore(1)
> timeit(fib(500)) #this show 20 seconds
>
> useCore(2)
> timeit(fib(500)) #this show 10 seconds [/code]
>
> Is it possible?
>

> and more, can threads/multi-processors/clusters be used to improve fib?
>

> thanx

Of course multi-core processor can improve single function using
multiprocessing, as long as the function is parallelizable. The Fibonacci
function is not a parallelizable function though.

However, there are ways to improve your fibonacci function. Either put
off the recursion or implement a memoization. An imperative fibonacci is
much faster than a naive recursion one because a naive recursive
fibonacci function is O(2**n) while the imperative one O(n). Memoization
must be used to help recursive fibonacci to become O(n).

Steven D'Aprano

unread,

Feb 10, 2009, 4:58:36 PM2/10/09

to

On Tue, 10 Feb 2009 12:43:20 +0000, Lie Ryan wrote:

> Of course multi-core processor can improve single function using
> multiprocessing, as long as the function is parallelizable. The
> Fibonacci function is not a parallelizable function though.

As I understand it, there's very little benefit to multi-cores in Python
due to the GIL.

--
Steven.

Steven D'Aprano

unread,

Feb 10, 2009, 5:01:29 PM2/10/09

to

On Tue, 10 Feb 2009 22:41:25 +1000, Gerhard Weis wrote:

> btw. the timeings are not that different for the naive recursion in OP's
> version and yours.
> fib(500) on my machine:
> OP's: 0.00116 (far away from millions of years)
> This here: 0.000583

I don't believe those timings are credible. On my machine, it took a
minute to calculate fib(38), and while my machine isn't exactly the
fastest box possible, nor is it especially slow.

I don't wish to imply that you are deliberately lying, but your result of
0.00116 seconds for the naive version of fib(500) is so unrealistic in my
experience that I believe you must be confused. Perhaps you've timed a
less naive fib() but thought it was the naive version.

Unless somebody can point out an error in my analysis, I'm sticking to my
earlier claim that the naive version of fib(500) requires an unbelievably
huge number of function calls: significantly more than the value of fib
(500) itself. See my earlier post in this thread for details.

--
Steven

Steven D'Aprano

unread,

Feb 10, 2009, 5:26:39 PM2/10/09

to

On Tue, 10 Feb 2009 02:05:35 -0800, Niklas Norrthon wrote:

> According to the common definition of fibonacci numbers fib(0) = 0, fib
> (1) = 1 and fib(n) = fib(n-1) + fib(n-2) for n > 1. So the number above
> is fib(501).

So it is. Oops, off by one error!

Or, if you prefer, it's the right algorithm for a Lucas sequence with
first two values 1 and 1 instead of 0 and 1. :)

>>>> timeit.Timer('fib(500)', 'from __main__ import fib').timeit(1)
>>
>> 0.00083398818969726562
>
> And now for my version (which admitedly isn't really mine, and returns
> slightly incorrect fib(n) for large values of n, due to the limited
> floating point precision).

The floating point version is nice, but it starts giving incorrect
answers relatively early, from n=71. But if you don't need accurate
results (a relative error of 3e-15 for n=71), it is very fast.

--
Steven

Gerhard Weis

unread,

Feb 10, 2009, 5:31:46 PM2/10/09

to

On 2009-02-11 08:01:29 +1000, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> said:

I am sorry for the wrong timing, I mixed up the function names. The
naive version used partly your version and partly the naive recursion.
So less naive is a good description :)

after fixing it:
naive fib(38): ~40seconds

oyster

unread,

Feb 10, 2009, 11:21:12 PM2/10/09

to pytho...@python.org

Hi, guys, my fib(xx) is just an example to show "what is a single
function" and "what is the effect I expect to see when enable
multi-core".

My real purpose is to know "whether multi-core can help to improve the
speed of a common function". But I know definitely that the answer is
NO.

James Mills

unread,

Feb 10, 2009, 11:57:26 PM2/10/09

to oyster, pytho...@python.org

On Wed, Feb 11, 2009 at 2:21 PM, oyster <lepto....@gmail.com> wrote:
> My real purpose is to know "whether multi-core can help to improve the
> speed of a common function". But I know definitely that the answer is
> NO.

As stated by others, and even myself,
it is not possible to just "automagically"
improve the execution speed of a single
function - let alone an application.

Your problem must be capable of being divided up
into work units that can be parallelized. If this is not
possible, multiple cores (no matter how many you have)
-will not- help you.

cheers
James

Chris Rebert

unread,

Feb 11, 2009, 12:01:53 AM2/11/09

to James Mills, pytho...@python.org, oyster

See also Amdahl's Law -- http://en.wikipedia.org/wiki/Amdahl%27s_law

Paul Rubin

unread,

Feb 11, 2009, 12:51:00 AM2/11/09

to

Well, it depends on the implementation. Python isn't at the moment
designed for using multicores but as it evolves that may change.

The blog post http://cgi.cse.unsw.edu.au/~dons/blog/2007/11/29
may be of interest re parallelizing that fibonacci function.

Cameron Laird

unread,

Feb 17, 2009, 9:30:27 AM2/17/09

to

In article <pan.2009.02...@REMOVE.THIS.cybersource.com.au>,

Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:

.
.
.

>> And now for my version (which admitedly isn't really mine, and returns
>> slightly incorrect fib(n) for large values of n, due to the limited
>> floating point precision).
>
>The floating point version is nice, but it starts giving incorrect
>answers relatively early, from n=71. But if you don't need accurate
>results (a relative error of 3e-15 for n=71), it is very fast.

.
.
.
While my personal opinion is that it's silly to
characterize an error of 3e-15 as not "accurate",
I think more constructive is to focus on the fact
that the closed-form solution can be touched up
to give a precise integral solution, while re-
taining its (approximately) O(log n) run-time
cost.

Steven D'Aprano

unread,

Feb 18, 2009, 2:24:35 AM2/18/09

to

On Tue, 17 Feb 2009 14:30:27 +0000, Cameron Laird wrote:

> In article <pan.2009.02...@REMOVE.THIS.cybersource.com.au>,
> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
> .
> .
> .
>>> And now for my version (which admitedly isn't really mine, and returns
>>> slightly incorrect fib(n) for large values of n, due to the limited
>>> floating point precision).
>>
>>The floating point version is nice, but it starts giving incorrect
>>answers relatively early, from n=71. But if you don't need accurate
>>results (a relative error of 3e-15 for n=71), it is very fast.
> .
> .
> .
> While my personal opinion is that it's silly to characterize an error of
> 3e-15 as not "accurate",

That's a *relative* error. The absolute error is 1 for n=71, 59 for n=80,
1196093 for n=100, and 1875662300654090482610609259 for n=200. As far as
I can tell, the absolute error grows without bounds as n increases -- and
it overflows at n=1475.

I agree that a relative error is small, and if your use allows it, then
by all means use the inexact floating point function. But if you need
exact results, then you won't get it using floats.

> I think more constructive is to focus on the
> fact that the closed-form solution can be touched up to give a precise
> integral solution, while re- taining its (approximately) O(log n)
> run-time cost.

I doubt that. Since both phi and sqrt(5) are irrational numbers, it would
require an infinite number of integer digits to get precise integral
solutions unless there was some sort of freakish cancellation. But you
probably could get a very good integral solution which gives exact
results up to whatever limit you wanted, bound only by the amount of
memory and time you were willing to spend.

--
Steven

sturlamolden

unread,

Feb 18, 2009, 7:50:58 PM2/18/09

to

On Feb 10, 8:38 am, Paul McGuire <pt...@austin.rr.com> wrote:

> Even worse than linear, the function is recursive, which as I
> understand it, is inherently a no-no when looking for code that is
> parallel-friendly.

There is no way to parallelize Fibonacci numbers computed with linear
time complexity, as we must know fib(n-1) to compute fib(n). But if we
were to use Oyster's recursive version, which has geometric
complexity, one could parallelize with a 'forkjoin' scheme.

Anyhow, this runs in amortized linear time and would be a lot faster:

def fib(n):
while True:
try:
return fib.seq[n]
except AttributeError:
fib.seq = [0, 1, 1]
except IndexError:
fib.seq.append( fib.seq[-2] +
fib.seq[-1] )

S.M.

sturlamolden

unread,

Feb 18, 2009, 8:36:25 PM2/18/09

to

On Feb 10, 7:28 am, oyster <lepto.pyt...@gmail.com> wrote:
> I mean this
> [code]
> def fib(n):
> if n<=1:
> return 1
> return fib(n-1)+fib(n-2)

Let's rewrite that slightly and see...

def fib(n, _map=None):
if not _map: _map = map
if n > 2:
return sum(_map(fib, (n-1, n-2)))
else:
return 1

if __name__ == '__main__':

from multiprocessing import Pool
from time import clock

p = Pool()
n = 36

t0 = clock()
fib(n, p.map)
t1 = clock()

print 'parallel t: %f seconds' % (t1-t0)

t0 = clock()
fib(n)
t1 = clock()

print 'sequential t: %f seconds' % (t1-t0)

With two cores:

E:\>python fibotest.py
parallel t: 31.300226 seconds
sequential t: 48.070695 seconds

Yes it can!

Sturla Molden

Gabriel Genellina

unread,

Feb 19, 2009, 3:54:45 AM2/19/09

to pytho...@python.org

En Wed, 18 Feb 2009 23:36:25 -0200, sturlamolden <sturla...@yahoo.no>
escribió:

> On Feb 10, 7:28 am, oyster <lepto.pyt...@gmail.com> wrote:

> Let's rewrite that slightly and see...
>
> def fib(n, _map=None):
> if not _map: _map = map
> if n > 2:
> return sum(_map(fib, (n-1, n-2)))
> else:
> return 1
>
>

> With two cores:
>
> E:\>python fibotest.py
> parallel t: 31.300226 seconds
> sequential t: 48.070695 seconds
>
>
> Yes it can!

Are you kidding?
This is like building a house one brick at a time, but before placing a
new brick, you throw all the previous work and start again from start.
Certainly two workers would help more than one - but the best way to do it
is *NOT* to repeat all that additional work over and over in the first
place.
Even my Pentium I MMX 233MHz can compute fib(36) thousand of times faster
than that with the right algorithm. So I don't see the point in
parallelizing if you're going to get infinitely worse results...

--
Gabriel Genellina

Paul Rubin

unread,

Feb 19, 2009, 5:57:34 AM2/19/09

to

"Gabriel Genellina" <gags...@yahoo.com.ar> writes:
> Even my Pentium I MMX 233MHz can compute fib(36) thousand of times faster
> than that with the right algorithm. So I don't see the point in
> parallelizing if you're going to get infinitely worse results...

The point is to test the parallelization scheme. Recursive fibonacci
is a standard benchmark for this sort of thing. Sturlamolden's test
may be inspired by a recent similar GHC test:

http://donsbot.wordpress.com/2007/11/29/

Cameron Laird

unread,

Feb 19, 2009, 9:33:02 AM2/19/09

to

.
.
.
There *is* freakish cancellation.

I entirely recognize that Python builds in floating-point
calculations of limited precision. The closed-form expres-
sion for Fibonacci, though, is exact, and can be coded in
terms of, for example, Python infinite-precision integers.
If there's sufficient interest, I'll cheerfully do so, al-
though only after meeting my own deadlines for February
for commitments outside comp.lang.python.

Aahz

unread,

Feb 20, 2009, 1:07:15 PM2/20/09

to

In article <pan.2009.02...@REMOVE.THIS.cybersource.com.au>,
Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>

>As I understand it, there's very little benefit to multi-cores in Python
>due to the GIL.

As phrased, your statement is completely wrong. Here's a more correct
phrasing: "For threaded compute-bound applications written in pure
Python, there's very little benefit to multiple cores." But threaded
I/O-bound applications do receive some benefit from multiple cores, and
using multiple processes certainly leverages multiple cores. If boosting
the performance of a threaded compute-bound application is important, one
can always write the critical parts in C/C++.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Grant Edwards

unread,

Feb 20, 2009, 2:20:48 PM2/20/09

to

On 2009-02-20, Aahz <aa...@pythoncraft.com> wrote:
> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>
>> As I understand it, there's very little benefit to multi-cores
>> in Python due to the GIL.
>
> As phrased, your statement is completely wrong. Here's a more
> correct phrasing: "For threaded compute-bound applications
> written in pure Python, there's very little benefit to
> multiple cores." But threaded I/O-bound applications do
> receive some benefit from multiple cores, and using multiple
> processes certainly leverages multiple cores. If boosting the
> performance of a threaded compute-bound application is
> important, one can always write the critical parts in C/C++.

Do the crunchy bits of scipy/numpy, scientific python, vtk and
other compute-intensive libraries tend to release the GIL while
they're busy "computing"?

[Perhaps using them doesn't count as "pure Python", but...]

--
Grant Edwards grante Yow! NEWARK has been
at REZONED!! DES MOINES has
visi.com been REZONED!!

Robert Kern

unread,

Feb 20, 2009, 4:52:02 PM2/20/09

to pytho...@python.org

On 2009-02-20 13:20, Unknown wrote:
> On 2009-02-20, Aahz<aa...@pythoncraft.com> wrote:
>> Steven D'Aprano<ste...@REMOVE.THIS.cybersource.com.au> wrote:
>>
>>> As I understand it, there's very little benefit to multi-cores
>>> in Python due to the GIL.
>> As phrased, your statement is completely wrong. Here's a more
>> correct phrasing: "For threaded compute-bound applications
>> written in pure Python, there's very little benefit to
>> multiple cores." But threaded I/O-bound applications do
>> receive some benefit from multiple cores, and using multiple
>> processes certainly leverages multiple cores. If boosting the
>> performance of a threaded compute-bound application is
>> important, one can always write the critical parts in C/C++.
>
> Do the crunchy bits of scipy/numpy, scientific python, vtk and
> other compute-intensive libraries tend to release the GIL while
> they're busy "computing"?

Often. Not as often as they could, sometimes.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Grant Edwards

unread,

Feb 20, 2009, 5:27:19 PM2/20/09

to

On 2009-02-20, Robert Kern <rober...@gmail.com> wrote:
>
>> Do the crunchy bits of scipy/numpy, scientific python, vtk and
>> other compute-intensive libraries tend to release the GIL
>> while they're busy "computing"?
>
> Often. Not as often as they could, sometimes.

On one hand, the upshot of that is that by finding an
appropriate library module you might gain some of the same
benefits as removing the GIL.

On the other hand, that doesn't help if you're doing something
original enough that nobody has written a library to handle
large chunks of it.

And on the grasping hand, I find that most of us vastly
overestimate the originality of what we're doing.

--
Grant Edwards grante Yow! This is a NO-FRILLS
at flight -- hold th' CANADIAN
visi.com BACON!!

Kurt Smith

unread,

Feb 20, 2009, 11:50:03 PM2/20/09

to pytho...@python.org

On Fri, Feb 20, 2009 at 4:27 PM, Grant Edwards <invalid@invalid> wrote:
> On one hand, the upshot of that is that by finding an
> appropriate library module you might gain some of the same
> benefits as removing the GIL.
>
> On the other hand, that doesn't help if you're doing something
> original enough that nobody has written a library to handle
> large chunks of it.
>
> And on the grasping hand, I find that most of us vastly
> overestimate the originality of what we're doing.

+1 The Mote in God's Eye / The Gripping Hand reference!

Kurt

Aahz

unread,

Feb 21, 2009, 12:45:25 PM2/21/09

to

In article <hfWdnTzVR8UNnwLU...@posted.visi>,

They definitely do not count as pure Python -- but I probably should have
mentioned that there are pre-existing libraries that can be used. It's
just that I/O is about the only area where there is a concerted effort to
ensure that GIL gets released.

Nick Craig-Wood

unread,

Feb 23, 2009, 4:32:00 AM2/23/09

to

sturlamolden <sturla...@yahoo.no> wrote:

Or something which runs in O(1)...

from gmpy import mpf, mpz, digits
from math import log, sqrt

root5_float = sqrt(5)
phi_float = (1 + root5_float) / 2
log_phi_float = log(phi_float)
log_root5_float = log(root5_float)
log2 = log(2)

def fibonacci_noniterative(n):
"""
Work out n'th Fibonacci number in an noniterative fashion using Binet's formula
"""
# Work out magnitude of answer
bits = int(((n * log_phi_float - log_root5_float) / log2)*1.1)
if bits < 0:
bits = 0
root5 = mpf(5, bits).sqrt()
phi = (mpf(1, bits) + root5) / mpf(2, bits)
f_n = phi**n / root5
f_n = mpz(f_n + mpf(1, bits) / mpf(2, bits))
return long(f_n)

It runs in O(1) if the O we are talking about is large integer
arithmetic. It actually runs in more like O(log(n)**1.5) time. It is
a lot faster than the iterative approach too which runs in
O(log(n)**2) for any n > ~40.

Times to calculate F(n)

n, iterative, noniterative
1, 0.000003, 0.000016
2, 0.000003, 0.000014
4, 0.000003, 0.000012
8, 0.000003, 0.000016
16, 0.000004, 0.000009
32, 0.000007, 0.000010
64, 0.000014, 0.000010
128, 0.000032, 0.000011
256, 0.000072, 0.000014
512, 0.000157, 0.000019
1024, 0.000361, 0.000031
2048, 0.000881, 0.000066
4096, 0.002504, 0.000178
8192, 0.007640, 0.000521
16384, 0.025362, 0.001572
32768, 0.090633, 0.004701
65536, 0.342724, 0.014132
131072, 1.335723, 0.045194
262144, 5.273360, 0.111201
524288, 20.791205, 0.301488
1048576, 82.617938, 0.833644

Here is the rest of the program just in case anyone wants to play

def fibonacci_iterative(n):
"""
Work out n'th Fibonacci number in an iterative fashion
"""
a, b = 0, 1
while n > 0:
a, b = a + b, a
n -= 1
return a

if __name__ == "__main__":
# warmup
fibonacci_noniterative(100)
print " n, iterative, noniterative"
for e in range(30):
i = 2 ** e
t0 = time()
f_iterative = fibonacci_iterative(i)
t_iterative = time() - t0
t0 = time()
f_noniterative = fibonacci_noniterative(i)
t_noniterative = time() - t0
print "%10d, %10.6f, %10.6f" % (i, t_iterative, t_noniterative)
if f_iterative != f_noniterative:
print "Calculation error"
print "iterative", f_iterative
print "non iterative", f_noniterative
print "difference", f_iterative-f_noniterative

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick