Hello everyone,
I am running an experiment that involves making numerous calls to mesolve / sesolve (on Schrodinger's equation), which represents most of the execution time of the code.
My goal is to reduce this execution time as much as possible, and I have 2 angles of attack:
1) Running with OpenMP
- I re-installed Qutip with OpenMP support and ran the tests successfully (I however notice I do not have Intel MKL installed).
- I passed options to my solver properly (I verified that, at runtime: num_cpus=4, openmp_threads=4, use_openmp=True)
I however do not see any improvement in the performance of my code.
Q:
a) Am I missing something here? (I'm running on a Mac, so despite using gcc/g++ with openMP flag to compile library, at runtime clang is taking over and doesn't have an openmp flag)
b) Does mesolve / sesolve currently benefit from OpenMP acceleration? If yes, what speedup can I hope for, and in what circumstances?
2) Using mcsolve
The documentation states that mcsolve is a better approach to large hamiltonians / systems than mesolve, so I'm really interested in trying this guy.
Since a Monte-Carlo approach is by nature embarrassingly parallel, It should show something close to linear speedup with OpenMP and would be amazing on a GPU in the future.
Q:
a) Does it currently support OpenMP acceleration, and what should I expect?
More generally, what would you recommend, regarding performance with Qutip?
Thank you for your help !
Valentin