Just commented all OpenMP's pragmas and application started behave predictably, much faster, with increasing performance as number of cores for OpenBLAS increases. Looks like OMP_NUM_THREADS is just ignored what is expected. The question is still however what is the correct way to use OpenMP?
--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Xianyi,I actually have a similar problem as Sergey's. But in my application, I actually have both cases you mentioned. Under such circumstances, do you have any specific suggestion?
What's more, you mentioned "For OpenBLAS with USE_THREAD=1 and USE_OPENMP=1, it uses OpenMP to parallel the function. Therefore, OpenBLAS only depends on OMP_NUM_THREADS." I'm quite confused what it actually means that OpenBLAS uses OpenMP to parallel the function. Can you give a detailed example about this?
Thanks Xianyi. One more probing. Since in my application, I also use the environment variable OMP_NUM_THREADS to control other multi-thread parallelization, does this mean this environment variable is shared by both OpenBLAS and my other code?
Also, I used the run-time function openblas_set_num_threads(1) to set the OpenBLAS thread number to be 1. But in this case, will this be ignored?
In my mind, the perfect situation will be, when I use--------------------#pragma omp parallel forfor(...) {cblas_sgemv(...)}--------------------OpenBLAS runs in a single-thread mode, and when I use
--------------------#pragma omp parallel forfor(...) {...}cblas_sgemv(...)--------------------OpenBLAS runs in multi-thread mode.