High-level multithreading using c++11 feature std::thread conflicts with OpenBLAS multithreading?

190 views
Skip to first unread message

johnny b

unread,
Mar 5, 2015, 6:04:11 AM3/5/15
to openbla...@googlegroups.com
Hello,

I am performing high level multithreading using the c++11 feature std::thread on an Odroid U3 and Odroid XU3 which both have an ARM CPU.
I am using OpenBLAS for matrix multiplications and other operations.
I have one main thread which calls a worker thread (using std::thread). Then, some work gets done in the main thread (I call it foreground processing) and in the worker thread (I call it background processing) in parallel before both getting synchronized again (using std::thread::join).
Both (foreground and background processing) call BLAS functions.
I measure three timings: foreground processing, background processing and the timing between "before starting the worker thread" and "after synchronizing main and worker thread" lets call it timer algo.
So timer algo = max(timer foreground processing, timer background processing) + overhead

Now if I use OpenBLAS with multithreading "timer foreground processing" and "timer background processing" are slightly less compared to using OpenBLAS without multithreading.
But the overhead gets really high if I use OpenBLAS with multithreading. This is not the case if I use OpenBLAS without multithreading (overhead very small).

For "OpenBLAS with multithreading", I compile the library simply with default settings.
For "OpenBLAS without multithreading", I changed the following entries in the Makefile.rule file before compiling:
USE_THREAD = 0
USE_OPENMP = 0
NUM_THREADS = 1
COMMON_OPT = -O3

Do you have any hints for me why the overhead is that high using OpenBLAS with multithreading in conjunction with std::thread? Or any explanations?

Thanks a lot in advance,
Johannes

Zhang Xianyi

unread,
Mar 5, 2015, 3:40:10 PM3/5/15
to johnny b, openbla...@googlegroups.com
Hi Johannes,

Thank you for your report.

How did you control the number of threads for OpenBLAS mutlithreading version?
I wonder it was the conflict of threads. 

I think Odroid-U3 contains 4 ARM cores. Did you use 1 foreground thread + 3 background threads or 1 foreground thread + 1 background thread?


Xianyi

--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages