I have a little question about the speed of numpy vs IDL 7.0. I did a
very simple little check by computing just a cosine in a loop. I was
quite surprised to see an order of magnitude of difference between numpy
and IDL, I would have thought that for such a basic function, the speed
would be approximatively the same.
I suppose that some of the difference may come from the default data
type of 64bits in numpy and 32 bits in IDL. Is there a way to change the
numpy default data type (without recompiling) ?
And I'm not an expert at all, maybe there is a better explanation, like
a better use of the several CPU core by IDL ?
I'm working with windows 7 64 bits on a core i7.
any hint is welcome.
Thanks.
Here the IDL code :
Julian1 = SYSTIME( /JULIAN , /UTC )
for j=0,9999 do begin
for i=0,999 do begin
a=cos(2*!pi*i/100.)
endfor
endfor
Julian2 = SYSTIME( /JULIAN , /UTC )
print, (Julian2-Julian1)*86400.0
print,cpt
end
result:
% Compiled module: $MAIN$.
2.9999837
The python code:
from numpy import *
from time import time
time1 = time()
for j in range(10000):
for i in range(1000):
a=cos(2*pi*i/100.)
time2 = time()
print time2-time1
result:
In [2]: run python_test_speed.py
24.1809999943
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
You're right, with math.cos, the code take 4.3s to run, not as fast as
IDL, but a lot better.
25/11/10 @ 11:13 (+0100), thus spake Jean-Luc Menut:
> I suppose that some of the difference may come from the default data
> type of 64bits in numpy and 32 bits in IDL. Is there a way to change the
> numpy default data type (without recompiling) ?
This is probably not the issue.
> And I'm not an expert at all, maybe there is a better explanation, like
> a better use of the several CPU core by IDL ?
I'm not an expert either, but the basic idea you have to get is
that "for" loops in Python are slow. Numpy is not going to change
this. Instead, Numpy allows you to work with "vectors" and "arrays"
so that you need not putting loops in your code. So, you have to
change the way you think about things, it takes a little to get
used to it at first.
Cheers,
--
Ernest
Whilst you've imported everything from numpy you're not really using numpy -
you're still using a slow Python (double) loop. The power of numpy comes from
vectorising your code - i.e. applying functions to arrays of data.
The example below demonstrates an 80 fold increase in speed by vectorising the
calculation:
def method1():
a = empty([1000, 10000])
for j in range(10000):
for i in range(1000):
a[i,j] = cos(2*pi*i/100.)
return a
#
def method2():
ij = np.repeat((2*pi*np.arange(1000)/100.)[:,None], 10000, axis=1)
return np.cos(ij)
#
In [46]: timeit method1()
1 loops, best of 3: 47.9 s per loop
In [47]: timeit method2()
1 loops, best of 3: 589 ms per loop
In [48]: allclose(method1(), method2())
Out[48]: True
Yes I know but IDL share this characteristics with numpy, and sometimes
you cannot avoid loop. Anyway it was just a test to compare the speed of
the cosine function in IDL and numpy.
The point others are trying to make is that
you *instead* tested the speed of creation
of a certain object type. To test the *function*
speeds, feed both large arrays.
>>> type(0.5)
<type 'float'>
>>> type(math.cos(0.5))
<type 'float'>
>>> type(np.cos(0.5))
<type 'numpy.float64'>
hth,
Alan Isaac
> Yes I know but IDL share this characteristics with numpy, and sometimes
> you cannot avoid loop. Anyway it was just a test to compare the speed of
> the cosine function in IDL and numpy.
No, you compared IDL looping and python looping. You did not even use
numpy. Loops are slow in python, and will remain so in the near
future. OTOH, there are many ways to deal with this issue in python
compared to IDL (cython being a fairly popular one).
David
Vectorised numpy version already blow away the results.
Here is what I get using the IDL version (with IDL v7.1):
IDL> .r test_idl
% Compiled module: $MAIN$.
4.0000185
I[10]: time run test_python
43.305727005
and using a Cythonized version:
from math import pi
cdef extern from "math.h":
float cos(float)
cpdef float myloop(int n1, int n2, float n3):
cdef float a
cdef int i, j
for j in range(n1):
for i in range(n2):
a=cos(2*pi*i/n3)
compiling the setup.py file python setup.py build_ext --inplace
and importing the function into IPython
from mycython import myloop
I[6]: timeit myloop(10000, 1000, 100.0)
1 loops, best of 3: 2.91 s per loop
--
Gökhan
Bruce Sherwood
As others have already point out, you should make sure that you use
numpy.cos with arrays in order to get good performance.
I don't know whether IDL is using multi-cores or not, but if you are
looking for ultimate performance, you can always use Numexpr that makes
use of multicores. For example, using a machine with 8 cores (w/
hyperthreading), we have:
>>> from math import pi
>>> import numpy as np
>>> import numexpr as ne
>>> i = np.arange(1e6)
>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 85.2 ms per loop
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 8.28 ms per loop
If you don't have a machine with a lot of cores, but still want to get
good performance, you can still link Numexpr against Intel's VML (Vector
Math Library). For example, using Numexpr+VML with only one core (in
another machine):
>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 66.7 ms per loop
>>> ne.set_vml_num_threads(1)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 9.1 ms per loop
which also gives a pretty good speedup. Curiously, Numexpr+VML is not
that good at using multicores in this case:
>>> ne.set_vml_num_threads(2)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
10 loops, best of 3: 14.7 ms per loop
I don't really know why Numexpr+VML is taking more time using 2 threads
than only one, but it is probably due to Numexpr requiring better fine-
tuning in combination with VML :-/
--
Francesc Alted
Yes I understand that. I just want to stress that it was not a benchmark
(nor a critic) but a test to know if it was interesting to translate
directly an IDL code into python/numpy before trying to optimize it (I
know more python than IDL). I expected to have approximatively the same
speed for both, was surprised by the result, and wanted to know if there
was an obvious reason besides the unoptimization for scalars.