Consider ,
MAX = 1024*1024;
float *arr = new float [MAX];
extern char normalize1[64];
Part 1 :
Now, I need to execute this code
for(int i = 0; i < MAX; ++i)
arr[i] = arr[i] / normalize1[i % 64];
Then, Part 2
{
char modulo[64];
memzero(modulo, 64*sizeof(char));
for(int i = 0; i < MAX; ++i)
modulo[ some_slow_function(arr[i]) % 64 ]++;
// some_slow_function is really slow. So, it would be best if we dont
wait for the first part to finish.
Print_array(modulo);
}
So, how would we improve it on a quad proc machine?. This is an
example I have cooked up based on something I have been thinking
about.
From what I have seen there is a lot of ambiguity regarding
multithreading, or may be it is just me.
(Also, if someone could suggest a nice multithreading book preferably
for windows that would be helpful. It must have some examples like the
above.)
Anyways, please suggest your solutions.
My solution -
I have just two synchronization variables, an event and a critical
section object--
HANDLE AllDoneEvent;
CRITICAL_SECTION TotalThreadsdoneCS;
I have four threads. Two of them will take half each of the first
part. After they are done they will update a simple variable
Thread1done=1 (similarly Thread2done=2).
Two more will take half each of the second part. After they are done,
they will update simple variables Thread3Done=1 (similarly
Thread4done=1).
Each thread will also do the following at the end of its task -
EnterCriticalSection(TotalThreadsdoneCS);
TotalThreadsdone++;
if (TotalThreadsdone == 4) SetEvent(AllDoneEvent);
ExitCriticalSection(TotalThreadsdoneCS);
First of all will this work. Secondly is this trivial, any
optimizations or tunings possible.
What other issues would you consider?. Thanks
> I have four threads. Two of them will take half each of the first
> part. After they are done they will update a simple variable
> Thread1done=1 (similarly Thread2done=2).
> Two more will take half each of the second part. After they are done,
> they will update simple variables Thread3Done=1 (similarly
> Thread4done=1).
I think you are trying to do the scheduler's job. What if one half,
for whatever reason, actually takes up taking a lot longer than the
other half? For example, some CPUs have cores that share cache. What
if one thread gets a core that's sharing a cache with a busy core and
one gets a core that's sharing a cache with a non-busy core? One
thread may finish all its work really fast and then the other half is
being finished on only one core.
You should let whatever thread happens to be running take some of
whatever work happens to be left to do, if possible. In this case,
that's very possible. So why not do it?
> Each thread will also do the following at the end of its task -
> EnterCriticalSection(TotalThreadsdoneCS);
> TotalThreadsdone++;
> if (TotalThreadsdone == 4) SetEvent(AllDoneEvent);
> ExitCriticalSection(TotalThreadsdoneCS);
>
> First of all will this work.
That should work. But what is the purpose of this event? If there's
something that needs to be done when all the work is done, WHY NOT
JUST DO IT?! Why signal an event that tells some other thread to do it
when this thread is already running and has nothing else to do?!
> Secondly is this trivial, any
> optimizations or tunings possible.
> What other issues would you consider?. Thanks
It will work, and in most cases it will even work well (because in
most cases, it pretty much doesn't matter how well you do it). Unless
you are really pushing the limits, getting synchronization well is not
really all that important so long as it is correct. But you are doing
things in a very brute force way.
DS
> for whatever reason, actually takes up taking a lot longer than the
> other half? For example, some CPUs have cores that share cache. What
> if one thread gets a core that's sharing a cache with a busy core and
> one gets a core that's sharing a cache with a non-busy core? One
> thread may finish all its work really fast and then the other half is
> being finished on only one core.
>
> You should let whatever thread happens to be running take some of
> whatever work happens to be left to do, if possible. In this case,
> that's very possible. So why not do it?
>
> > Each thread will also do the following at the end of its task -
> > EnterCriticalSection(TotalThreadsdoneCS);
> > TotalThreadsdone++;
> > if (TotalThreadsdone == 4) SetEvent(AllDoneEvent);
> > ExitCriticalSection(TotalThreadsdoneCS);
>
> > First of all will this work.
>
> That should work. But what is the purpose of this event? If there's
> something that needs to be done when all the work is done, WHY NOT
> JUST DO IT?! Why signal an event that tells some other thread to do it
> when this thread is already running and has nothing else to do?!
>
I was thinking this thread would exit after this and would be
terminated.
I would suppose it is much cleaner if the main thread remains idle,
and 4 threads symmetrically come in, do tasks and go down. What do you
think?
Its of course a very minor thing in this case, but I feel it is an
important design decision and when it come to threading I am always
lost.
Perhaps you suggest the better design would be to let the main thread
handle one of these tasks (in which case only three threads would be
created). Do you think that is better?. The fourth thread created
above makes it look clean doesnt seem to have any overhead.
Any thoughts?.
> > Secondly is this trivial, any
> > optimizations or tunings possible.
> > What other issues would you consider?. Thanks
>
> It will work, and in most cases it will even work well (because in
> most cases, it pretty much doesn't matter how well you do it). Unless
> you are really pushing the limits, getting synchronization well is not
> really all that important so long as it is correct. But you are doing
> things in a very brute force way.
>
> DS
Thanks
> > That should work. But what is the purpose of this event? If there's
> > something that needs to be done when all the work is done, WHY NOT
> > JUST DO IT?! Why signal an event that tells some other thread to do it
> > when this thread is already running and has nothing else to do?!
> I was thinking this thread would exit after this and would be
> terminated.
Why? There's still work to be done and there's a thread running.
> I would suppose it is much cleaner if the main thread remains idle,
> and 4 threads symmetrically come in, do tasks and go down. What do you
> think?
I think you should get the notion of a "main thread" out of your head.
Threads are a symmetric interface.
> Its of course a very minor thing in this case, but I feel it is an
> important design decision and when it come to threading I am always
> lost.
I agree, it's minor in this case. But you may encounter cases later
where it's not so minor. You should try not to force a context switch
from one thread to another where that context switch serves no
purpose.
> Perhaps you suggest the better design would be to let the main thread
> handle one of these tasks (in which case only three threads would be
> created). Do you think that is better?. The fourth thread created
> above makes it look clean doesnt seem to have any overhead.
Think about threads that loop like this:
1) Is there any work to do?
2) If not, wait for work, go to step 1.
3) Do some of the work.
4) Did the work we just did create new work that has to be done? If
so, add it to the list of work that needs to be done.
5) Go to step 1.
Don't try to assign work to specific threads or make specific threads
do specific tasks unless there's some specific reason to do that.
Because if you do that, you reduce the scheduler's flexibility and
that always has a cost.
One of the hardest things is figuring out the right 'graininess' to
work you're distributing. You don't want to cut 4,000 jobs into four
chunks just because you have four cores. What if one of those cores is
being used by some other process? You may finish three chunks and then
have to do an entire 1/4 of your work on only one core. Yuck.
Forget how many cores you have, that's the scheduler's problem. (And
you may or may not get all or any of those cores and that may vary
from moment to moment.) You divide your job into pieces based simply
on keeping the overhead of assigning pieces tolerable.
As a rule of thumb, the pieces of work that a thread picks up should
ideally take somewhere between 1/10th and 1/100th of a second. As a
rule of thumb, create a few more threads than the most cores you might
wind up on. (Perhaps a few extra if your threads may get blocked on
disk I/O, for example.)
DS
> Don't try to assign work to specific threads or make specific threads
> do specific tasks unless there's some specific reason to do that.
> Because if you do that, you reduce the scheduler's flexibility and
> that always has a cost.
Sometimes it can be beneficial for a thread to assign work to itself which
is a "form of" assigning work to specific threads. Of course this tends to
work best if you use some sort of work-stealing scheme...
;^)
[...]