GDL much faster than IDL ... when will we have a least a MKL compilation ?

598 views
Skip to first unread message

Kallisthene Kallisthene

unread,
Feb 23, 2018, 8:43:04 AM2/23/18
to idl-pvwave

source : https://www.scivision.co/speed-of-matlab-vs-python-numpy-numba/

Harris IDL

(used only by astronomers?) is very slow compared to other modern computing languages, including GDL, the free open-source IDL-compatible program.

Matrix Operations Benchmark

This test multiplies two matrices that are too large to fit in CPU cache, so it is a test of system RAM bandwidth as well.

Task: Matrix Multiply a 5000 x 5000 array by another 5000 x 5000 array each comprised of random double-precision 64-bit float numbers.

Results: in milliseconds, best time to compute the result

CPU/GPU/GCC Gfortran (matmul) Gfortran (dgemm) Gfortran (sgemm) Ifort 14 (matmul) Ifort 14 (dgemm) Python 3.5 (MKL) Matlab R2015a (MKL) Julia 0.4.2 IDL 8.4 GDL 0.9.6 Python 3.5 (Cuda 7.5)
i7-3770  2147 2147   18967  2352 2420  2394  2161     0.261
W541/K1100M    1717 960     1160         0.161
E7500    2147             86211 3368    
  • Python CUDA via Anaconda Accelerate (formerly NumbaPro): Note that Python CUDA scaled O(N^0.5), while Python MKL scaled O(N^2.8) or so.
  • Wow! GDL is so much faster than IDL at matrix multiplication.

That's so sad ;-(

Jim Pendleton

unread,
Feb 23, 2018, 9:42:40 AM2/23/18
to idl-pvwave
Without seeing the actual IDL code, I'd take with a grain of salt any marketing material like this that includes caveats like those stated in the first paragraph.

Jim P

Lajos Foldy

unread,
Feb 23, 2018, 10:09:10 AM2/23/18
to idl-pvwave
Have you tried FL? FL uses OpenBLAS, it's performance should be comparable to MKL.

Still, one test is no test. For any number n>1 I can show you a test where FL is n times faster than IDL :-) (assuming memory proportional to n)

regards,
Lajos

Kallisthene Kallisthene

unread,
Feb 23, 2018, 10:33:03 AM2/23/18
to idl-pvwave
Well,

when such an ubiquitous operation shows such results, unsurprising since IDL hasn't embraced the multi-cores evolution, I think there is no possible caveats (except if there is a code error).
You can read a 2012 old post on the same subject here.
Right now I am moving CPU intensive jobs on the python-bridge, at the expense of a clumsy and ugly code.

Best





Nikola Vitas

unread,
Feb 23, 2018, 1:39:20 PM2/23/18
to idl-pvwave
There is a fresh update of the detailed comparison made by modeling guru (the source files included):
https://modelingguru.nasa.gov/docs/DOC-2676

The results are very different from those quoted in the original post.


Kallisthene Kallisthene

unread,
Feb 26, 2018, 3:47:04 AM2/26/18
to idl-pvwave
Interesting numbers but what is at stake here is the use of all these useless cores beyond the first one.

"All the above runs were conducted on a node that has 28 cores. Basically, only one core was used."

I indeed use the IDL_IDLbridge but it is a lot of pain and only one or two users out of several tens are truly able to use it.

Jim Pendleton

unread,
Feb 26, 2018, 7:23:46 PM2/26/18
to idl-pvwave
There are a large number of IDL core routines written to take advantage of multicore capabilities.  These have been present for over a decade and half.  See, for example, the following topics in the online help:  Routines that Use the Thread Pool, CPU, and Thread Pool. The latter discusses when thread pooling can be a hindrance instead of a help.  Just because you have a hammer, doesn't mean every problem is a nail.

If your bottleneck is of a type that can't be optimized for these functions, then a second option is the new Asynchronous Job class that's brand new in IDL 8.7, out just this past week.  This new class is intended to take a lot of the guesswork out of using the IDL_IDLBridge and/or SPAWN.

If there is a specific complexity in your code that can be represented as a true parallel processing algorithm without conditionals and so forth, your best bang for the buck is custom GPU kernels, with the understanding that once you push data to the GPU you want to keep the processing there as long as possible.  You don't want to push a few GB of floats up to the GPU for a simple array multiplication, download the result, then send the data back up for an addition, as very basic example.

Jim P
"I work for Harris, but I have vanishingly little input on product features"

Brian McNoldy

unread,
Mar 1, 2018, 8:48:49 AM3/1/18
to idl-pvwave
IDL is also extremely popular among atmospheric scientists since it was first created (and still is today).  Once you have a lot of experience and code in a language, it's hard to turn away from it (even if it's not always the best at every task).

Kallisthene Kallisthene

unread,
Mar 6, 2018, 11:00:55 AM3/6/18
to idl-pvwave
Well I was just saying that IDL doesn't use all available possibilities to speed its speed, in particular in linear algebra in which I observed a speedup of 28 for large matrix (http://idlcoyote.com/comp.lang.idl-pvwave/index.php?t=msg&goto=81576&#msg_81576).
There is also the julia langage with its LLVM-based just-in-time (JIT) compilerwhich seems to achieve very good performances.

I have no problem with existing solutions but as I said earlier casual developers, a lot of scientists, aren't able to take advantage of multicore and it can be argued that single core performance is stagnant these years.

Wayne Landsman

unread,
Feb 19, 2020, 2:37:16 PM2/19/20
to idl-pvwave
It looks like  L3Harris has finally implemented the MKL library in IDL 8.7.3.         They give several examples of the speed improvement in the seminar shown below.      (The seminar is mostly about ENVI updates but they do mention the MKL library update in IDL 8.7.3.)    For example, multiplication of two 2048 x 2048 matrices was 1.06s in IDL 8.7.2 and is 0.047s in IDL 8.7.3 (and 0.053s in Python numpy).     They say that V8.7.3 should be released within a couple of weeks.   Wayne

Reply all
Reply to author
Forward
0 new messages