Nogil taking longer time than single threaded

402 views
Skip to first unread message

Souvik Ghosh

unread,
Dec 15, 2021, 10:28:33 AM12/15/21
to cython-users
Hello, 

This my cython code,

cdef long subtractor_p(long a, long b) nogil:

    cdef:

        int i

        long ac= a

        long bc= b



    for i from 0 <= i < bc:

        ac-=i

    return ac


cpdef long subtractor_py(long long int a, long long int b):


    cdef:

        long long int ac=a

        long long int bc=a

        long long int r

        int num_threads


    openmp.omp_set_dynamic(1)

    with nogil, parallel():

        num_threads = openmp.omp_get_num_threads()

        r= subtractor_p(a, b)

        with gil:

            return r


It is taking longer time than, this below:

cpdef long subtractor_py(long long int a, long long int b):


    cdef:

        long long int ac=a

        long long int bc=a

        long long int r

        int num_threads



      r= subtractor_p(a, b)

      return r

I have one more doubt, how to understand if the function called with gil or with nogil? I've started cython recently, I'm quite confused how to do it correctly. Documentation is not well descriptive to understand for beginners to understand. Any help is appreciated thank you. 


Souvik Ghosh

unread,
Dec 16, 2021, 10:44:56 AM12/16/21
to cython-users
What is the difference between 
with nogil: and, 
with nogil, parallel():

I don't know what to use and how they both are different to each other.
I've loving cython to use. Cython gave us many features beyond what python gave us to do. 
But, I'm not getting any responses of my messages from anyone. I hope I can get some support being here. 
Long live cython. Thanks. 

da-woods

unread,
Dec 16, 2021, 3:58:18 PM12/16/21
to cython...@googlegroups.com
"with nogil:" releases the GIL for the duration of the with block. Releasing the GIL allows another Python thread to run. Releasing the GIL does not have any impact on the speed of the code within the block.

"with nogil, parallel()" releases the GIL and starts an OpenMP "parallel" block. If you plan to use Cython's parallel features then you should read up about OpenMP since Cython's features are based around OpenMP. On a 4 core computer this will typically split into 4 threads. As I told you last time: all the threads will perform exactly the same work. Therefore it will be the same speed or slower because you have simply repeated the same work multiple times. If you want to use this section well then use the cython.parallel.threadid() function to get the thread ID and an "if" statement to choose different work depending on the thread ID.

"prange" is typically more useful - it can be used to split a "for" loop between multiple threads.

> But, I'm not getting any responses of my messages from anyone.

Bear in mind there are a fairly limited number of people who regularly answer here and they may have other priorities - you should not necessarily expect an immediate response.
--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/3318025f-b68c-46e3-901d-4652238694e0n%40googlegroups.com.


Souvik Ghosh

unread,
Dec 17, 2021, 12:27:41 PM12/17/21
to cython-users
Unfortulately, 

with nogil, parallel(): is taking longer time than nogil: even though parellel() is opening 4 threads on my system. I measured the many different functions using the same .so file. In each cases, it is taking longer time than single threaded. How can get better performance benefit using my cpu cores? Even it is using multiple threads why it is much more time than than single thread?

da-woods

unread,
Dec 17, 2021, 3:49:38 PM12/17/21
to cython...@googlegroups.com
As I said in my previous two replies:

It is taking longer because you have not split the work.Instead you are repeating the same work 4 times.

I'll treat this as my final attempt to explain it.

Souvik Ghosh

unread,
Dec 18, 2021, 8:44:48 AM12/18/21
to cython-users

How to split the work in the multiple cores with multithreading. Believe me, I just started cython few days ago. I'm currently learning but having this doubts. I'm completing a project with Cython. I'm really sorry if anyone feels disturbing on my questions. 

Souvik Ghosh

unread,
Dec 18, 2021, 9:19:14 AM12/18/21
to cython-users

You said, for splitting the work we use parallel() but that is taking longer time than not using it. With nogil, parallel():
taking 0.07secs but, with nogil: taking 0.03 secs. 

da-woods

unread,
Dec 18, 2021, 10:58:22 AM12/18/21
to cython...@googlegroups.com
What you would normally do is try to parallelize a loop using prange

from cython.parallel cimport prange

cdef long subtractor_p(long a, long b):

    cdef:
        int i
        long ac= a
        long bc= b

    for i in prange(bc, nogil=True):
        ac-=i
    return ac

"ac" is used as an OpenMP reduction. Remove the "with nogil, parallel():" from the calling function - it is not needed or useful.

You may find that the loop is too simple to benefit from parallelism. If this is the case then there is nothing you can do: the non-parallel version is as good as you can get.

Nothing I have shown here is very different from what's in the documentation https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html
Reply all
Reply to author
Forward
0 new messages