Re: 4 pthreads - same performance as 1 pthread - Other cores not utilized [Tegra 3]

1,116 views
Skip to first unread message

Shervin Emami

unread,
Oct 8, 2012, 7:59:29 PM10/8/12
to andro...@googlegroups.com
When you say the compute is really long, do you mean in the order of microseconds, milliseconds or seconds? Because depending on the circumstances, it might not power up all 4 cores until it is doing something CPU-intensive for tens or hundreds of milliseconds, and if your code is mostly waiting on something else such as GPU / RAM / SD card / network / other threads, then it probably doesn't need to use multiple cores.

Cheers,
Shervin.
Senior Systems Engineer, NVIDIA.


On Friday, October 5, 2012 11:09:59 PM UTC-7, llynx wrote:
My code threading code looks like this, with standard static compute threads within a class: 

for (int x = 0; x < 4; x++) pthread_create(&threads[x],NULL, &cm::computeX, &simulation);
for (int x = 0; x < 4; x++) pthread_join(threads[x], NULL);

The 4 compute threads are completely independent, the compute is really long so the overhead from starting the threads is low in comparison.

Threaded result is the same speed as a the non-threaded result. Any suggestions?

Angel Segura

unread,
Oct 9, 2012, 1:51:47 PM10/9/12
to andro...@googlegroups.com
Quite interesting. I did the same question months ago without an answer.
Based on Shervin's reply, it makes someone think that it doesnt matter if you explicitly create threads for whatever your purpose is. Supposing that is true then, how could we know the state on which the system decide to kick other cores in order to gain the potencial of threading? Could it be the fact that the scheduling procedures are configured in a particular way? I dont know, but it would be interesting if we complement this situation.

My observations long ago, were that the main thread monopolize most of the time spent on execution, while spawned threads were left with time to execute their work. Try measuring the time of your threads and you will see.

2012/10/8 Shervin Emami <shervi...@gmail.com>
--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/android-ndk/-/VwV9y0O5PVgJ.

To post to this group, send email to andro...@googlegroups.com.
To unsubscribe from this group, send email to android-ndk...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/android-ndk?hl=en.

Shervin Emami

unread,
Oct 9, 2012, 11:12:54 PM10/9/12
to andro...@googlegroups.com
It's true that spawning multiple threads on a multi-core CPU does not guarantee that they will run on multiple cores, the hardware &/or OS decide that for you. Generally this should happen automatically without you worrying about it, except that if you do some multi-threaded processing for a short duration and expect it to always use all cores then you can get confusing results. This happens often when you measure the time to execute a single iteration of code on single vs multi-threaded code and become surprised that perhaps it is not faster with multi-threading. But if you run the same test for a longer duration (eg: 1 or 2 seconds), then it is safe to say that intensive multi-threaded code would be spread across all 4 cores. You might be lucky and find your 20ms of code runs on multiple cores (eg: if they were already running anyway because of a heavy multi-core app such as the camera or web browser running at the same time, etc), but for measuring performance you should do it over a long interval (this is recommended for normal performance testing including single-core code anyway).

To put it into perspective, let's say for simplicity that your OS is running just once every 10 milliseconds (as this is common), so if you create new threads, the other threads probably won't even get a chance to start for roughly that long, and both the OS & CPU hardware have to detect that based on recent history it is worth powering up some more cores rather than just increasing the clock frequency of the current cores (more cores will not be powered up unless if it really looks worth it, since it will result in higher power draw). If they do get powered up, there will be a delay until the multiple cores are ready, then they will start transferring the multiple threads you created. So if each of these steps happens at say 10 millisecond intervals then it's not surprising that it can take hundreds of milliseconds for your code to be fully spread across 4 cores.

Like I said, running a test for atleast 1 or 2 seconds should be a safe bet (either by doing your test multiple times or on bigger data), and depending on how parallel the code is, you can definitely get very close to 4x speedup by using 4 cores, such as for camera image processing, etc.

Cheers,
Shervin.

fadden

unread,
Oct 10, 2012, 3:25:52 PM10/10/12
to android-ndk
On Oct 9, 8:12 pm, Shervin Emami <shervin.em...@gmail.com> wrote:
> Like I said, running a test for atleast 1 or 2 seconds should be a safe bet
> (either by doing your test multiple times or on bigger data), and depending
> on how parallel the code is, you can definitely get very close to 4x
> speedup by using 4 cores, such as for camera image processing, etc.

Something like this: http://bigflake.com/cpu-spinner.c.txt

jeff shanab

unread,
Oct 10, 2012, 7:49:41 AM10/10/12
to andro...@googlegroups.com
Depends on OS is very true. For the desktop it is more obvious. (win32 all threads on same core period. Mac OS mostly 64bit shares cores very nicely, Linux usually also shares nicesly)
The affinity within a process is to use the same core. Synchronization primitives are less expensive on the same core.
One way to allow better core balance might be to refactor the code into multiprocess instead of multi-thread.
On one project I am using ZMQ. It allows you to use a message queue to seperate async tasks as threads,processes or different machines simultanously or with a single line of code change. I have not tried compiling ZMQ for android yet.

To view this discussion on the web visit https://groups.google.com/d/msg/android-ndk/-/xGbEYHIk1qoJ.
Reply all
Reply to author
Forward
0 new messages