Re: [spyder] How does Spyder work around the GIL? Is it MPI compatible?

776 views
Skip to first unread message

Carlos Córdoba

unread,
May 24, 2013, 6:01:42 PM5/24/13
to spyd...@googlegroups.com
Hi Ana,

I think numpy support threads if it was compiled using Intel's Math
Kernel Library, but you'll be better asking on the numpy mailing list.
You could also be interested in a project called Numba, with let's you
accelerate vectorized computations based on numpy quite a lot by just by
using a decorator.

Spyder doesn't do any threading of its own on numpy or scipy but I think
it could be really useful to you for writing and debugging code.

However I think you should forget about using threads because that's not
the right way to parallelize things in Python (because of the GIL).
Please take a look at Numba and the other libraries I mentioned in my
other post.

Cheers,
Carlos

El 22/05/13 13:28, sokovic....@gmail.com escribió:
> Hello,
>
> please help me answer those questions, along with: do numpy and scipy
> thread or spyder does?
> I am planing to install spyder on Cray XE6 machine, with intention do
> so some python MPI and threading, and debugging
> Would spyder be of use for me?
>
> Thanks
> Ana
> --
> You received this message because you are subscribed to the Google
> Groups "spyder" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to spyderlib+...@googlegroups.com.
> To post to this group, send email to spyd...@googlegroups.com.
> Visit this group at http://groups.google.com/group/spyderlib?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

sokovic....@gmail.com

unread,
May 28, 2013, 2:47:12 PM5/28/13
to spyd...@googlegroups.com
Hi Carols,

thank you so much for this kind answers. I have just few more questions:
1. how different is multiprocessing from spyder in terms of implementation?
2. Does it mean that spyder uses http?
3. ipython stand for interactive python? How are I'm going to deal with that and qsub?

Appreciate your help,
Ana

David Verelst

unread,
May 29, 2013, 11:41:25 AM5/29/13
to spyder
Hi,

I'll try to explain how I understand your question, and how I would approach this issue. Please correct me if I am wrong. There are many possible ways of using parallel computational power, and the best solution heavily depends upon your specific problem. I am far from an expert, but from the little that I've learned over the past years I know that there is no easy answer.

1) Spyder uses multi-threading on the application level (using QThread if I understand correctly) so things like code completion, monitor, documentation lookup, ... doesn't freeze the whole program while doing. However, this is not related to running your python scripts. Spyders multiprocessing implementation is, as far as I understand it, not related at all to the scripts you want to run. Python/IPython consoles are run in process that is separate from the main spyder process.

When you want to use the parallel power of your computer cluster, you need to launch a python or ipython console in spyder that is aware of all those CPUs/GPUs. From there you can use the built in modules threading and multiprocessing to start explicitly using all that computational power. Note that depending on what you are trying to achieve, programming parallel applications can be challenging. The summer school "Advanced Scientific Programming in Python" [1] has some very nice lectures on this, and I would recommend you to have a look at those informative slides.

2) I don't think spyder uses http to connect to other python consoles, not sure how http is related to the problem at hand

3) interactive python console refers to an interactive work-flow and does not necessarily has anything to do with qsub (I assume you refer to your clusters mechanism to submit and queue jobs?)

When I look at how we have our clusters configured, I could theoretically imagine the following work flow:
* launch an IPython instance on the cluster with qsub and let it use as many CPUs as you see fit (for instance, specify #PBS -lnodes=xxx:ppn=yyy in your PBS script), and give it as much wall time as you think you need.
* connect to that IPython console using the notebook/web interface (I know people do that, I just don't know how. There has to be some documentation available somewhere, or ask on the IPython mailinglist, or see [8])
* Within Spyder, connect to the IPython console running on the cluster, but I don't think that is already implemented, not sure if that's planned.
* or, checkout [8]: Using IPython for parallel computing
* rent an IPython instance on cluster configured for you at [11]

You can release the GIL when using Cython, see for example the slides on Cython [1]. You can not release the GIL in a Python script. For explicit concurrency directly in your Python script, use the build in modules threading and multiprocessing.

Numpy/Scipy/Numba and other Python modules already use, in some cases, the parallel power of your machine. Some examples on Numba are can be found here [2] and here [3]: For Numpy/Scipy, this is dependent on your BLAS/LAPACK implementation (such as MKL, OpenBlas, ACML): the low level number crunching routines on which they are built. Building NumPy against a BLAS/LAPACK implementation optimized for your machine from source can be challenging depending on your experience/skills [9] [10], but performance can be increased significantly [4].

Other more exotic libraries that can help unleash parallel power of CPUs/GPUs (for which I only know they exist): Magma [5], Plasma [6], CUBLAS [7].
ok, this reply exploded in length and I can not vouch for its quality or its usefulness...I'll better stop here :-)

Regards,
David


--

sokovic....@gmail.com

unread,
May 30, 2013, 11:13:29 AM5/30/13
to spyd...@googlegroups.com
Thanks I like your answers but I have some more questions:)




On Wednesday, May 29, 2013 10:41:25 AM UTC-5, David wrote:
Hi,

I'll try to explain how I understand your question, and how I would approach this issue. Please correct me if I am wrong. There are many possible ways of using parallel computational power, and the best solution heavily depends upon your specific problem. I am far from an expert, but from the little that I've learned over the past years I know that there is no easy answer.

1) Spyder uses multi-threading on the application level (using QThread if I understand correctly) so things like code completion, monitor, documentation lookup, ... doesn't freeze the whole program while doing. However, this is not related to running your python scripts. Spyders multiprocessing implementation is, as far as I understand it, not related at all to the scripts you want to run. Python/IPython consoles are run in process that is separate from the main spyder process.

What is QThread? 
So it is useful if you want to use python as a wrapper to run other jobs?

When you want to use the parallel power of your computer cluster, you need to launch a python or ipython console in spyder that is aware of all those CPUs/GPUs. From there you can use the built in modules threading and multiprocessing to start explicitly using all that computational power. Note that depending on what you are trying to achieve, programming parallel applications can be challenging. The summer school "Advanced Scientific Programming in Python" [1] has some very nice lectures on this, and I would recommend you to have a look at those informative slides.
I am not sure why would that be useful, can't I just 
grep processor /proc/cpuinfo | tail -n 1 | awk '{print $3+1}'
?
 

2) I don't think spyder uses http to connect to other python consoles, not sure how http is related to the problem at hand
In previous reply Carlos said it uses sockets, what protocol/service does it use to use sockets?

3) interactive python console refers to an interactive work-flow and does not necessarily has anything to do with qsub (I assume you refer to your clusters mechanism to submit and queue jobs?)
So is it interactive or not? Why do you call it a workflow?

When I look at how we have our clusters configured, I could theoretically imagine the following work flow:
* launch an IPython instance on the cluster with qsub and let it use as many CPUs as you see fit (for instance, specify #PBS -lnodes=xxx:ppn=yyy in your PBS script), and give it as much wall time as you think you need.
* connect to that IPython console using the notebook/web interface (I know people do that, I just don't know how. There has to be some documentation available somewhere, or ask on the IPython mailinglist, or see [8])
* Within Spyder, connect to the IPython console running on the cluster, but I don't think that is already implemented, not sure if that's planned.
* or, checkout [8]: Using IPython for parallel computing
* rent an IPython instance on cluster configured for you at [11]

You can release the GIL when using Cython, see for example the slides on Cython [1]. You can not release the GIL in a Python script. For explicit concurrency directly in your Python script, use the build in modules threading and multiprocessing.
How would you proceed in releasing the GIL and whether you think it will have consequences?

David Verelst

unread,
May 30, 2013, 11:32:05 AM5/30/13
to spyder
Hi,

Sorry, won't have time to go into more detail for now. If you want a more relevant answer, you could try to formulate a very concrete question. But please bear in mind that this is the spyder mailing list, and as such we do not deal with complex parallel stuff such as MPI, multithreading Python, IPython, the GIL, etc.
As a side note: Carlos is a more reliable source since he is a Spyder dev, I am just jumping in from time to time on the mailing list.

Never worked with QThread (just believe it is being used by Spyder), more info to be found on the web: https://duckduckgo.com/?q=QThread

Regards,
David

sokovic....@gmail.com

unread,
May 30, 2013, 1:59:14 PM5/30/13
to spyd...@googlegroups.com
thanks, do I need to resubmit questions or just to wait for Carlos to answer?

David Verelst

unread,
May 31, 2013, 7:03:10 AM5/31/13
to spyder
There is no need for resubmitting, but if you want more helpful answers you should consider reformulating your question in a more specific way (it is very hard to answer a very general and broad question). And as Carlos points out, you could consider asking those specific questions on the subject of concurrency in Python programming (IPython, threads, processes, MPI, numpy, numba, etc) on their corresponding mailing lists.

Regards,
David

Carlos Córdoba

unread,
May 31, 2013, 11:32:45 AM5/31/13
to spyd...@googlegroups.com
Hi Ana,

The sockets used by Spyder are tcp ones. And QThread is a subcomponent of the Qt graphical toolkit, used to run graphical operations asynchronously.

But at the end you don't need to know these details because Spyder won't help you *by itself* to run your programs in parallel. You need to use the right libraries to do the job, which are the ones David and me have told you about.

Cheers,
Carlos

El 30/05/13 12:59, sokovic....@gmail.com escribió:
Reply all
Reply to author
Forward
0 new messages