pthread question

85 views
Skip to first unread message

Alan Wolfe

unread,
Feb 9, 2012, 1:12:18 PM2/9/12
to native-cli...@googlegroups.com
Hey Guys,

My app is seeing a hang on startup that i get maybe 1/10 times, but some other people get it 100% of the time.

My pthread knowledge lacks a little bit... Does anyone see any issues?

Here's the high level render loop:
1) In my Instance's init function, I call CallOnMainThread with a delay of 0, calling my Update function for the first time
2) In my update function, I render to my graphics2d context (render details below!) and then flush my 2d context
3) In the callback of my 2d context flush, i call my update function again, which renders and flushes again.  then i get another callback, render and flush again etc. (this is the render loop essentially)

Rendering details (my rendering is multithreaded):

At init... (in the constructor... this is just setup for the render stuff)
* I create a "BackToSleep" Mutex and condition variable
* I get the number of processors on the system by calling sysconf(_SC_NPROCESSORS_ONLN)
* I create the same number of threads as there are processors.  For each one i...
  * Create a "WakeUp" Mutex and condition variable for this thread
  * In the parameters passed to the thread I put the pointer to the "BackToSleep" mutex and condition variable
  * In the parameters passed to the thread I also put the pointer to a variable called "ThreadsInFlight" along with a pointer to the mutex that protects that variable.
  * Lastly, I start the thread

In my rendering, the screen is broken up into tiles, where there's a queue of tiles to render and the threads just grab the next tile index to render as they are looking for work.  When they are out of work, they go back to sleep.
When rendering (main thread)...
  * I set a variable NextCellToRender to the number of screen tiles that there are
  * I set the "ThreadsInFlight" variable to the number of threads that there are (this write is protected by the "ThreadsInFlight" mutex.  Doesn't need to be, but doesn't hurt... the "ThreadsInFlight" value is actually part of a ThreadSafeUINT class)
  * For each thread, i call pthread_cond_signal for each thread's "WakeUp" condition variable to wake up all the threads.
  * I then lock my "BackToSleep" mutex, and wait on the "BackToSleep" condition variable.  After that, of course unlocking the "BackToSleep" mutex.
  * Now that the threads have rendered all the screen tiles (which each have their own pixel buffer for their region of the screen), i gather up all the tiles and copy them onto the REAL pixel buffer (the one that gets flushed to the screen)

Render Thread Function...
  * I start out by casting the thread parameters to the appropriate struct type to get the parameters passed to the thread
  * While(1)...
    * Lock the "WakeUp" mutex, wait on the "WakeUp" condition variable then unlock the "WakeUp" Mutex (ie it should sleep til woken up by the main thread)
    * I have a function that locks the "NextCellToRender" mutex, decriments if it is greater than zero, unlocks the mutex, and returns true if the number was > 0 before -decrement, else returns false.  The thread calls this in a while loop (basically.. while there is work to do, keep going)
    * Inside the while loop, it renders the current cell to the pixel buffer owned by the current screen cell
    * After the while loop that renders the screen cells, it locks the "ThreadsInFlight" mutex, decriments ThreadsInFlight if it's > 0 and unlocks the mutex.
    * If this thread is the one that decrimented ThreadsInFlight to 0, that means it's the final thread that is awake, so it signals the "BackToSleep" condition variable, which should let the main thread wake up again and continue on it's way.

What do you guys think... do you see any dangerous areas that could be causing a lock?

Victor Khimenko

unread,
Feb 9, 2012, 3:14:24 PM2/9/12
to native-cli...@googlegroups.com
Algorithm looks quite complex and much will depend on the details of the implementation. For example how do you handle spurious wakeups? What happens if you main thread wakes up all the other threads which find that there are nothing to do and try to wake up the main thread (which is not yet gone to sleep)?

The locking algorithm looks well-separate from the rest of your program so you can try to test it in relacy: http://www.1024cores.net/home/relacy-race-detector

--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To post to this group, send email to native-cli...@googlegroups.com.
To unsubscribe from this group, send email to native-client-di...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/native-client-discuss?hl=en.

Bennet Yee (余仕斌)

unread,
Feb 9, 2012, 3:38:32 PM2/9/12
to native-cli...@googlegroups.com
the last thread to wake up doesn't have to be the last thread to finish.  so if that thread finishes before some other, earlier woken-up thread is done, it will wake up the main thread to possibly do another frame.  this means that it might try to awaken a worker thread that isn't asleep yet.

in general, condvars should be used to indicate that a condition might possibly be true -- code should check the condition by testing shared variables (while holding the lock), and waiting on the condvar again if the condition is true.  it's also not clear from the description whether the code defends against spurious wakeups.
 
What do you guys think... do you see any dangerous areas that could be causing a lock?

--
You received this message because you are subscribed to the Google Groups "Native-Client-Discuss" group.
To post to this group, send email to native-cli...@googlegroups.com.
To unsubscribe from this group, send email to native-client-di...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/native-client-discuss?hl=en.



--
bennet s yee
i usually don't capitalize due to mild tendonitis

Alan Wolfe

unread,
Feb 9, 2012, 3:45:50 PM2/9/12
to native-cli...@googlegroups.com
What do you guys mean by spurious wakeups?
 
Also it doesnt assume the last thread to wake up is the last thread to finish.  After each thread is done doing all the work it can (the queue is empty), it locks a mutex protecting a var and decriments the var.  If the var was decrimented to zero by that thread, it means it's the "last one out" so to speak and signals the main thread to wake back up.
Thank you for taking the time to read through my stuff and understand how it works, it means a heck of a lot to me since i'm having difficulty reproducing or solving this issue myself!

Alan Wolfe

unread,
Feb 9, 2012, 3:52:29 PM2/9/12
to native-cli...@googlegroups.com
also if you guys have a simpler solution i would love to hear it (:
 
In a nutshell here's what i aim to do
 
#1 - Have N worker threads (where N = number of cores on the machine)
#2 - A thread safe queue containing all the cells that need to be rendered
#3 - the main thread wakes up the worker threads, and blocks until they are finished doing all the work.

Bennet Yee (余仕斌)

unread,
Feb 9, 2012, 4:00:13 PM2/9/12
to native-cli...@googlegroups.com
hi

On Thu, Feb 9, 2012 at 12:45 PM, Alan Wolfe <alan....@gmail.com> wrote:
What do you guys mean by spurious wakeups?
 

cond wait can return when the condition is not true.  this makes the implementation of mutexes and condvars simpler.  thus, "typical" code usually look something like:

/* body of worker thread loop */
pthread_mutex_lock(&this_thread->mu);
while (NULL == (work_item = (this_thread->dequeue_work()))) {
  pthread_cond_wait(&this_thread->cv, &this_thread->mu);
}
pthread_mutex_unlock(&this_thread->mu);
do_work(work_item);
/* make result available / announce done */

so that if there's no work due to a spurious wakeup, the thread just goes right back to waiting for work.
 
Also it doesnt assume the last thread to wake up is the last thread to finish.  After each thread is done doing all the work it can (the queue is empty), it locks a mutex protecting a var and decriments the var.  If the var was decrimented to zero by that thread, it means it's the "last one out" so to speak and signals the main thread to wake back up.

ah, my apologies.  i got the ordering for the steps confused.

Alan Wolfe

unread,
Feb 9, 2012, 4:02:27 PM2/9/12
to native-cli...@googlegroups.com
oh wow, i figured they would wait infinitely.
 
well then... i definitely have some threading issues LOL.
 
I'm going to try to fix that up and see where it gets me.
 
Thank you so much!

Victor Khimenko

unread,
Feb 9, 2012, 5:26:54 PM2/9/12
to native-cli...@googlegroups.com
On Fri, Feb 10, 2012 at 1:02 AM, Alan Wolfe <alan....@gmail.com> wrote:
oh wow, i figured they would wait infinitely.

Hmm... The fact that there are spurious wakeups is written right in the pthread_cond_wait description: 


When using condition variables there is always a boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_wait() or pthread_cond_timedwait() functions may occur. Since the return from pthread_cond_wait() or pthread_cond_timedwait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.

Alan Wolfe

unread,
Feb 9, 2012, 5:38:03 PM2/9/12
to native-cli...@googlegroups.com
yeah... i didn't read the documentation well enough though.  i saw the description of the function and thought "neat! that's what im looking for" and assumed how it would work :P
 
my bad, but im more educated now.
Reply all
Reply to author
Forward
0 new messages