Performance with Qutip (OpenMP, MKL, etc)

886 views
Skip to first unread message

valentin....@1qbit.com

unread,
Aug 9, 2018, 9:19:16 PM8/9/18
to QuTiP: Quantum Toolbox in Python
Hello everyone,

I am running an experiment that involves making numerous calls to mesolve / sesolve (on Schrodinger's equation), which represents most of the execution time of the code.
My goal is to reduce this execution time as much as possible, and I have 2 angles of attack:


1) Running with OpenMP

- I re-installed Qutip with OpenMP support and ran the tests successfully (I however notice I do not have Intel MKL installed).
- I passed options to my solver properly (I verified that, at runtime: num_cpus=4, openmp_threads=4, use_openmp=True)
I however do not see any improvement in the performance of my code.

Q:
a) Am I missing something here? (I'm running on a Mac, so despite using gcc/g++ with openMP flag to compile library, at runtime clang is taking over and doesn't have an openmp flag)
b) Does mesolve / sesolve currently benefit from OpenMP acceleration? If yes, what speedup can I hope for, and in what circumstances?


2) Using mcsolve

The documentation states that mcsolve is a better approach to large hamiltonians / systems than mesolve, so I'm really interested in trying this guy.
Since a Monte-Carlo approach is by nature embarrassingly parallel, It should show something close to linear speedup with OpenMP and would be amazing on a GPU in the future.

Q:
a) Does it currently support OpenMP acceleration, and what should I expect?


More generally, what would you recommend, regarding performance with Qutip?

Thank you for your help !

Valentin

Paul Nation

unread,
Aug 10, 2018, 5:45:28 AM8/10/18
to QuTiP Group
OpenMP (omp) does some good in mesolve, but not too much.  ODE solving is necessarily serial, so the only place where omp will help you is in the sparse matrix - dense vector computation (spmv).  At this point, you run into the fact that spmv is memory bandwidth limited, and thus omp will only help as much as your memory throughput allows.  In short, you do get some speed up if the matrices can fit into the CPU cache, as the memory is faster there.  For very large systems, there is no advantage to using more than one thread per physical cpu socket.

mcsolve does not use omp because it is already running in parallel using a process pool.  In general, it is faster for computing things like expectation values for larger systems.

--
You received this message because you are subscribed to the Google Groups "QuTiP: Quantum Toolbox in Python" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qutip+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Valentin Senicourt

unread,
Aug 10, 2018, 2:17:17 PM8/10/18
to qu...@googlegroups.com
Thank you for your answer, Paul.

I indeed expected less parallelism to be exposed in mesolve, for the reasons you brought up.

As for mcsolve, what control do I have over this process pool?  I'd ideally like to spawn one process per physical CPU core and have them all run in parallel.
Should I then ignore the openMP options for the solver, and set num_cpus to multiprocessing.cpu_count() ?
This code will run on my laptop (4 cores) but also on Amazon instances (16, 32 cores...) so I'm trying to make sure the processes map neatly.



To unsubscribe from this group and stop receiving emails from it, send an email to qutip+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "QuTiP: Quantum Toolbox in Python" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qutip/6-xIdUbaOhE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qutip+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages