Five times difference in performance

83 views
Skip to first unread message

Pu ZHANG

unread,
Nov 5, 2016, 9:54:03 AM11/5/16
to qu...@googlegroups.com
Dear fellows, 

I'm using the function spectrum to calculate the spectrum of cavity radiation. Huge difference (5 times) in execution time of a same piece of script occurs to two computers. Below is some information about the two simulations. 

QuTiP version: 3.1.0
System info: 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)]
Computation time: 0:09:47.520000

QuTiP version: 3.1.0
System info: 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Aug 21 2014, 18:22:21) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Computation time: 0:47:14.182388

The former one is a standalone desktop running Windows OS. It has 
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz

The latter one is a node on a cluster running Linux. The CPU clock frequency is 2.8 GHz. 

The cluster has lower CPU frequency, but I don't think it explains everything about the 5 times difference. My question is what else factors cause the slowdown. Is there anything I can do to make the performance better? 

Thanks! 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

Andrew M.C. Dawes

unread,
Nov 5, 2016, 11:07:53 AM11/5/16
to qu...@googlegroups.com
There is a lot more to computation speed than CPU frequency, especially if the code is moving data around in memory etc. Amount of ram, ram transfer rates, number of cores (i7 is quad-core). Since you are using the same code version on both machines, there really isn’t anything that QuTiP can do to help speed things up on one machine but not the other. You may be able to get the code faster on both machines, but it’s not QuTiP that is making it slower.

I’d say, run it on the faster machine ;-)

Seriously, though, the reason to use a cluster is if a problem can be solved in parallel by multiple processors. If that is true (and in some cases this is automatic) then your quad-core processor is already acting like four nodes and probably gives you at least 3x speedup right there.

-Andy



--
You received this message because you are subscribed to the Google Groups "QuTiP: Quantum Toolbox in Python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qutip+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pu ZHANG

unread,
Nov 5, 2016, 10:46:26 PM11/5/16
to qu...@googlegroups.com
Thanks, Andrew! 

We use cluster to run other mpi paralleled codes. As for QuTiP, I don't see a parallel version. Does it exist? I'd say for the moment we are using the cluster node as a normal computer. 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

To unsubscribe from this group and stop receiving emails from it, send an email to qutip+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "QuTiP: Quantum Toolbox in Python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qutip+unsubscribe@googlegroups.com.

Andrew Dawes

unread,
Nov 5, 2016, 11:37:18 PM11/5/16
to qu...@googlegroups.com

And keep in mind that qutip uses numpy for many operations which (depending on how your libraries are compiled) can use multi-threaded processing when multiple cores are available. That may be another difference between the two machines.

Andy




Sent from my phone using voice-recognition software and/or clumsy thumbs, please forgive any typos.
To unsubscribe from this group and stop receiving emails from it, send an email to qutip+un...@googlegroups.com.

Pu ZHANG

unread,
Nov 6, 2016, 8:07:33 AM11/6/16
to qu...@googlegroups.com
I did another test (time evolution with mesolve). The cluster node is still slower, but now the execution time is only two times that of the desktop computer. So the performance difference does depend strongly on what kind calculation is concerned. 

Thanks again! 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

Paul Nation

unread,
Nov 6, 2016, 10:16:24 AM11/6/16
to QuTiP Group

Assuming the system is large enough so that python function call time us negligible, then memory bandwidth is the main factor in the sparse matrix dense vector multiplication

Pu ZHANG

unread,
Nov 7, 2016, 7:34:49 AM11/7/16
to qu...@googlegroups.com
The memory bandwidth could be a reason. Thanks, Paul! 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

Pu ZHANG

unread,
Nov 7, 2016, 8:15:07 PM11/7/16
to qu...@googlegroups.com
Hi, Andrew! 

I have a follow-up question about the possibility of parallelization. Since QuTiP does not have a version supporting parallelization across nodes (e.g., via MPI), the calculation will not benefit from a cluster with many nodes. Is it so? 

Thanks! 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

On Sun, Nov 6, 2016 at 11:37 AM, Andrew Dawes <andrew...@gmail.com> wrote:

Paul Nation

unread,
Nov 7, 2016, 11:00:09 PM11/7/16
to QuTiP Group
People have run QuTiP on a cluster; For example doing optomechanical calculations.  You would just need to distribute the individual runs across the nodes.  

The QuTiP library itself has little use for MPI.  We do not have a way to do distributed arrays, evolution etc.  Indeed, even on a single machine, often times doing things in parallel is of little. if any at all, advantage because memory bandwidth tends to be the limiting factor and not the floating point operations on the CPU.


To unsubscribe from this group and stop receiving emails from it, send an email to qutip+un...@googlegroups.com.

Pu ZHANG

unread,
Nov 7, 2016, 11:27:41 PM11/7/16
to qu...@googlegroups.com
Yes, distributing jobs over nodes is the only workaround which came to my mind. But for some simulation, like spectrum calculation, distribution over nodes approach doesn't seem to work. Or maybe spectrum calculation can also be divided into many jobs? 

Thanks! 

Best regards, Pu Zhang

--
Faculty at School of Physics, Huazhong University of Science and Technology
Room 819 (N.), Yifu Science and Technology Building
1037 Luoyu Road, Wuhan, China

Reply all
Reply to author
Forward
0 new messages