Received: by 10.224.183.13 with SMTP id ce13mr2776363qab.4.1349252269775; Wed, 03 Oct 2012 01:17:49 -0700 (PDT) Received: by 10.236.118.82 with SMTP id k58mr100948yhh.1.1349252269749; Wed, 03 Oct 2012 01:17:49 -0700 (PDT) Path: e10ni165558057qan.0!nntp.google.com!l8no22755312qao.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.idl-pvwave Date: Wed, 3 Oct 2012 01:17:49 -0700 (PDT) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=144.204.65.11; posting-account=BmUhcQoAAAB_3KP5-bQDGnZzczPFQzAK NNTP-Posting-Host: 144.204.65.11 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <18ce2902-f513-450d-b5d2-dfe2c1de9d53@googlegroups.com> Subject: Speed does matter From: =?ISO-8859-1?Q?Kallisth=E8ne?= Injection-Date: Wed, 03 Oct 2012 08:17:49 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It started while using the same algorithm coded similarly (we hope) in Matl= ab and IDL, heavy with linear algebra and with very larges matrix. Unfortun= ately we got huge performances discrepancies, I mean not in the two or thre= e multiple but sometimes in the fifty range ! Then we tried to find some existing benchmarks and found a very partial one= here (http://fwenvi-idl.blogspot.com/2011/10/numpy-is-fast.html) for multi= plying large matrix, in which IDL outperforms Python by a factor of 27. But this is not the end of the story, we learned then in the same blog arti= cle of the existence of Numpy compiled against the Intel Math Kernel Librar= y. With these binaries now Python outperforms IDL by a factor greater than = 10 ! These incredible numbers prompted us to pit this MKL flavor of Python/Numpy= 32 bits against IDL 64 bits, here are some results obtained on a recent Wi= ndows 7 computer with 4 cores, unfortunately without the original plots. ***************************************************************************= ***** Multiplication of FLOAT32 Matrix (*) : IDL speed multiplicative of =D71.5 Multiplication of FLOAT64 Matrix (*) : IDL speed multiplicative of =D71.4= =20 Multiplication of Complex Matrix (*) : IDL speed multiplicative of =D72.8 =20 Matrix multiplication of FLOAT32 Matrix (#) : Python/Numpy MKL speed multip= licative of =D712 Matrix multiplication of FLOAT64 Matrix (#) : Python/Numpy MKL speed multip= licative of =D76 Matrix multiplication of Complex Matrix (#) : Python/Numpy MKL speed multip= licative of =D73.1 FFT of FLOAT32 Matrix : IDL speed multiplicative of =D71.2 FFT of FLOAT64 Matrix : roughly equivalent FFT of Complex Matrix : Python/Numpy MKL speed multiplicative of =D71.6 The inverse of a square array (with LAPACK routines in both cases) case is = harder to present with a synthetic number since the slope is different. Invert of FLOAT32 Matrix : Python/Numpy MKL speed multiplicative of =D71.4 = at 100=D7100 size increasing to =D720 at 1200=D71200 size. Invert of FLOAT64 Matrix : Python/Numpy MKL speed multiplicative of =D73.0 = at 100=D7100 size increasing to =D728 at 700=D7700 size. Invert of Complex Matrix : Python/Numpy MKL speed multiplicative of =D72.7 = at 100=D7100 size increasing to =D79 at 800=D7800 size. The Singular Value Decomposition of a square array (with LAPACK routines in= both cases) case is also harder to present with a synthetic number since t= he slope is different. SVD of FLOAT32 Matrix : Python/Numpy MKL speed multiplicative of =D71.4 at = 100=D7100 size increasing to =D720 at 1200=D71200 size SVD of FLOAT64 Matrix : Python/Numpy MKL speed multiplicative of =D70.73 at= 100=D7100 size increasing to =D728 at 1800=D71800 size. SVD of Complex Matrix : Python/Numpy MKL speed multiplicative of =D73.8 at = 100=D7100 size increasing to =D718 at 1500=D71500 size. ***************************************************************************= ***** How is it achieved ? Well, you could look at your CPU performance tab while= running Python MKL, all cores are at 100 %, while most IDL routines hardly= top 30 % on a 4-core computer. This can explain part of the performance ga= in. Similar benchmarks were computed on another computer between Python/Numpy M= KLPython/Numpy MKL and Matlab, demonstrating other artifacts but mostly wit= h "comparable" performances (in particular with Python MKL 64 bits). These = results highlight the incredible performances impact of the Intel Math Kern= el Library, in particular here in linear algebra routines. Since this Libra= ry is a Royalty-free, per developer licensing, I'd dream to see a future ID= L compilation against such Library. Any chances ?